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SUMMARY 

Introduction 

In  the  last  five  years,  over  a thousand  books,  articles  and  technical 
reports  have  been  published  describing  how  people  make  decisions  and 
how  they  can  be  helped  to  make  better  decisions.  Many  of  them  were 
the  product  of  ARPA's  decision  analysis  program.  This  report  provides 
a critical  overview  to  this  work,  its  applicability  and  directions  for 
further  research. 

Background  and  Approach 

Traditionally,  decision  makers  have  been  guided  by  their  own  intuitions, 
trial-and-error  and  habits.  In  a rapidly  changing  environment,  our 
intuitions  may  be  faulty  and  our  habitual  responses  invalid.  With 
decisions  of  great  consequence,  we  often  cannot  afford  the  luxury  of 
any  errors.  Over  the  last  fifteen  years,  psychologists,  decision  an- 
alysts, economists,  and  others  have  made  a concerted  effort  to  under- 
stand how  people  make  decisions  and  to  develop  analytical  techniques 
to  help  them  make  better  decisions.  ARPA's  Decision  Analysis  research 
program  has  been  at  the  forefront  of  this  effort. 

As  might  be  expected  with  a burgeoning,  multi-disciplinary  field,  it 
is  difficult  to  keep  track  of  all  that  is  done  and  to  draw  the  impli- 
cations of  research  done  in  one  discipline  for  research  done  in  another. 
This  report  surveys  the  entire  field  (over  a thousand  papers  were  read 
and  over  three  hundred  are  cited)  asking,  (1)  what  is  known?  (2)  what 
good  is  it?  and  (3)  what  else  must  we  learn?  Particular  attention  is 
given  to  work  integrating  research  describing  how  people  do  make  de- 
cisions with  normative  work  that  prescribes  how  people  should  make 
decisions . 

Topics  covered  include  descriptive  work  on:  probabilistic  judgment, 

riskless  choice,  risk-taking,  decision-making  policies,  dynamic  de- 
cision making  and  normative  models  for  assessing  probabilities, 
choosing  between  multi-attributed  alternatives,  decision  analysis, 
and  man/machine  systems.  Since  much  of  the  descriptive  work  is  the 
result  of  laboratory  experiments,  that  section  concludes  with  a dis- 
cussion of  whether  laboratory  findings  can  be  generalized  to  non- 
laboratory settings.  The  acid  tests  of  a normative  model  are  whether 
or  not  it  is  used  and  whether  or  not  it  works.  Thus,  the  paper  con- 
cludes with  a discussion  of  those  issues. 

Findings  and  Implications 

The  major  advance  in  descriptive  research  over  the  last  five  years  has 
been  the  discovery  that  people  systematically  violate  the  principles 
of  rational  decision  making  when  judging  probabilities,  making  predic- 
tions, or  otherwise  attempting  to  cope  with  probabilistic  tasks.  Biases 
in  judgments  of  uncertain  events  are  often  large  and  difficult  to 
eliminate.  The  source  of  these  biases  can  be  traced  to  various  heu- 
ristics or  mental  strategies  that  people  use  to  process  information. 
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The  study  of  heuristics  is  quite  new  and  it  appears  that  many  impor- 
tant discoveries  lie  ahead.  In  particular,  there  is  a need  for  inte- 
grating theory  to  tie  these  heuristics  together  and  to  interpret  them 
in  the  context  of  other  knowledge  about  thought  processes.  Work  has 
only  just  begun  on  developing  techniques  to  eliminate  these  biases  or 
to  ameliorate  their  effects.  In  the  final  discussion,  a strong  case 
is  made  that  judgmental  biases  affect  important  decisions  in  the  real 
world;  numerous  examples  are  provided. 

The  discovery  of  these  heuristics  and  biases  have  two  important  impli- 
cations for  applied  decision  making.  First,  they  point  to  the  need 
for  helping  decision  makers  avoid  the  biases,  both  by  training  and 
by  decision-aiding  techniques.  Second,  they  warn  decision  analysts 
against  uncritically  accepting  intuitive  judgments  provided  by  the 
decision  makers  whom  they  are  aiding. 


The  major  advance  in  normative  work  is  the  development  of  increasingly 
sophisticated  decision-aiding  techniques  and  in  their  application  to 
an  increasingly  diverse  set  of  problems . The  state  of  the  art  has 
been  advanced  by  greater  understanding  of  the  theoretical  underpinnings 
of  these  techniques,  by  the  development  of  computerized  techniques  for 
performing  analyses  and  assessing  the  effect  of  changing  model  param- 
eters, and  by  the  opportunity  decision  analysts  have  had  to  get  out 
in  the  field,  interact  with  decision  makers  and  see  how  their  tech- 
niques work.  Because  information  about  the  effectiveness  of  applied 
decision  analysis  is  often  proprietary  and  because  evaluation  methods 
seem  poorly  developed,  this  topic  is  only  lightly  touched.  The  in- 
creased acceptance  of  decision  analysis  by  decision  makers  in  diverse 
fields  is  some  indication  of  its  versatility  and  usefulness. 

The  major  challenges  for  normative  work  appear  to  be  providing  system- 
atic evaluations  of  their  effectiveness,  incorporating  the  descriptive 
research  into  their  techniques  and  applying  decision-aiding  techniques 
to  problems  of  national  importance. 
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Behavioral  decision  theory  has  two  interrelated  facets,  normative 
and  descriptive.  The  normative  theory  is  concerned  with  prescribing 
courses  of  action  that  conform  most  closely  to  the  decision  maker's 
beliefs  and  values.  Describing  these  beliefs  and  values  and  the  manner 
in  which  individuals  incorporate  them  into  their  decisions  is  the  aim 
of  descriptive  decision  theory. 

This  review  is  organized  around  these  two  facets.  The  first  sec- 
tion deals  with  descriptive  studies  of  judgment,  inference  and  choice; 
the  second  section  discusses  the  development  of  decision-aiding  tech- 
niques . 

As  we  reviewed  the  literature,  several  trends  caught  our  attention. 
One  is  that  decision  making  is  being  studied  by  researchers  from  an 
increasingly  diverse  set  of  disciplines,  including  medicine,  economics, 
education,  political  science,  geography,  engineering,  marketing,  and 
management  science,  as  well  as  psychology.  Nevertheless,  the  importance 


This  is  the  fourth  survey  of  this  topic  to  appear  in  the  Annual  Re- 
view. Its  predecessors  were  articles  by  Edwards  (78),  Becker  & 
McClintock  (24),  and  Rapoport  & Wallsten  (224).  The  present  review 
covers  publications  appearing  between  January  1,  1971  and  December 
31,  1975,  with  occasional  exceptions. 
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of  psychological  concepts  is  increasing,  in  both  the  normative  and 
descriptive  work.  Whereas  past  descriptive  studies  consisted  mainly 
of  rather  superficial  comparisons  between  actual  behavior  and  norma- 
tive models,  research  now  focuses  on  the  psychological  underpinnings 
of  observed  behavior.  Likewise,  the  prescriptive  enterprise  is  being 
psychologized  by  challenges  to  the  acceptability  of  the  fundamental 
axioms  of  utility  theory  (140,  188,  256). 

Second,  increasing  effort  is  being  devoted  to  the  development 
of  practical  methods  for  helping  people  cope  with  uncertainty.  Here, 
psychological  research  provides  guidance  about  how  to  elicit  the  judg- 
ments needed  for  decision-aiding  techniques. 

Third,  the  field  is  growing  rapidly,  as  evidenced  by  the  numerous 
reviews  and  bibliographies  produced  during  the  past  five  years.  Slovic 
and  Lichtenstein  (254)  reviewed  the  literature  on  Bayesian  and  regres- 
sion approaches  to  studying  information  processing  in  decision  making 
and  judgment;  Dillon  (73)  covered  utility  theory  with  a view  towards 
its  application  in  agricultural  contexts;  MacCrimmon  (187)  examined 
work  in  management  decision  making;  Shulman  and  Elstein  (247)  discussed 
the  implications  of  judgment  and  decision  making  research  for  teachers; 
Nickerson  and  Feehrer  (209)  searched  for  studies  relevant  to  the  training 
of  decision  makers  (since  there  aren't  many,  they  settled  for  a general 
review);  Vlek  and  Wagenaar  (292)  surveyed  the  entire  field  and  Koz- 
ielecki  (157)  and  Lee  (165)  have  provided  its  first  textbooks. 

A selective  and  annotated  bibliography  on  Behavioral  Decision 

4 

Theory  has  been  compiled  by  Barron  (18).  Kusyszyn  (161,  162)  has  pro- 
vided bibliographies  covering  the  psychology  of  gambling,  risk-taking, 
and  subjective  probability.  Houle  (124)  has  accumulated  a massive 
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bibliography  on  Bayesian  statistics  and  related  behavioral  work,  which 
by  1975  included  106  specialized  books,  1322  journal  articles,  and 
about  800  other  publications.  By  the  time  you  read  this,  Kleiter, 
Gachowetz  & Huber  (153)  are  presently  assembling  the  most  complete 
bibliography  ever  in  this  field.  They  generously  supplied  us  with 
more  than  1000  relevant  references,  all  produced  between  1971  and  1975. 

To  ease  cognitive  strain,  we  have  focused  on  psychological  aspects 
of  individual  judgment  and  decision  making.  Thus,  we  omit  group  and 
organizational  decision  making,  Bayesian  statistics,  and  much  of  the 
work  on  the  axiomatic  formulations  of  decision  theory.  Game  theory 
is  reviewed  elsewhere  in  this  volume.  Even  with  this  narrow  focus, 
we  have  had  to  limit  our  coverage  severely,  concentrating  on  those 
references  to  which  our  prejudices  have  led  us. 
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DESCRIPTIVE  RESEARCH 
Probabilistic  Judgment 

Because  of  the  importance  of  probabilistic  reasoning  to  decision 
making,  considerable  effort  has  been  devoted  to  studying  how  people 
perceive,  process  and  evaluate  the  probabilities  of  uncertain  events. 
Early  research  on  "intuitive  statistics"  led  Peterson  & Beach  (218) 
to  an  optimistic  conclusion: 

. . . man  gambles  well.  He  survives  and  prospers  while  using 
. . . fallible  information  to  infer  the  states  of  his  uncertain 
environment  and  to  predict  future  events  (p.  29). 

Experiments  that  have  compared  human  inferences  with  those 
of  statistical  man  show  that  the  normative  model  provides  a 
good  first  approximation  for  a psychological  theory  of  infer- 
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ence.  Inferences  made  by  subjects  are  influenced  by  appro- 
priate variables  in  appropriate  directions  (pp.  42-43). 

MODEL-BASED  PARADIGMS  One  result  of  this  high  regard  for  our  intel- 
lectual capability  has-been  a reliance  on  normative  models  in  descrip- 
tive research.  Thus,  Barclay,  Beach  & Braithwaite  (15)  proposed 
beginning  with  a normative  model  and  adjusting  its  form  or  parameters 
to  produce  a descriptive  model.  This  approach  is  best  examplified 
by  the  study  of  conservatism — the  tendency,  when  integrating  proba- 
bilistic information,  to  produce  posterior  probabilities  nearer  the 
prior  probabilities  than  those  specified  by  Bayes'  theorem.  In  1971, 
conservatism  was  identified  as  the  primary  finding  of  Bayesian  infor- 
mation integration  research  (254) . Reports  of  the  phenomenon  have 
continued  to  appear  in  tasks  involving  normally  distributed  populations 
(75,  290,  305),  and  in  that  old  favorite,  the  binomial  (bookbag  and 
poker  chip)  task  (3,196).  Even  filling  the  bookbags  with  male  and 
female  Polish  surnames  fails  to  lessen  the  effect  (262).  Donnell  & 
DuCharme's  (75)  subjects  became  optimal  when  told  the  normative  response, 
but  when  the  task  changed,  their  learning  failed  to  generalize.  As 
the  next  section  shows,  conservatism  occurs  only  in  certain  kinds  of 
inference  tasks.  In  a variety  of  other  settings,  people's  inferences 
are  too  extreme. 

Cascaded  inference  Real-life  problems  often  have  several  stages,  with 
inferences  at  each  stage  relying  on  data  which  are  themselves  inferences 
from  unreliable  observations  or  reports.  For  example,  a physician  who 
uses  the  condition  of  the  patient's  lungs  as  a cue  for  diagnosis  must 
first  infer  that  condition  from  unreliable  data  (e.g.,  the  sound  of 
a thumped  chest).  Several  normative  models  for  such  cascaded  or  multi- 


4 


stage  inference  tasks  have  been  developed  in  recent  years  (217,  238). 
Schum  (239)  has  shown  the  relevance  of  cascaded  inference  models  to 
the  judicial  problem  of  witness  credibility  and  the  probative  value  of 
witness  testimony. 

Descriptive  studies  of  cascaded  inference,  comparing  subjects' 
responses  in  the  laboratory  with  a normative  model,  have  consistently 
shown  a result  just  the  opposite  of  conservatism:  subjects'  posterior 

probabilities  are  more  extreme  than  those  prescribed  by  the  model 
(100,  217,  266).  The  extremity  of  subjects'  responses  has  been  traced 
to  their  use  of  a simple,  but  inappropriate,  "best-guess"  strategy  (103, 
137,  257,  266),  which  is  insensitive  to  data  unreliability. 

HEURISTICS  AND  BIASES  In  these  recent  studies  of  conservatism  and  cas- 
caded inference,  one  can  see  an  increasing  skepticism  about  the  normative 
model's  ability  to  fulfill  its  descriptive  role,  and  the  view  of  humans 
as  good  intuitive  statisticians  is  no  longer  paramount.  A psychological 
Rip  van  Winkle  who  dozed  off  after  reading  Peterson  & Beach  (218)  and 
roused  himself  only  recently  would  be  startled  by  the  widespread  change 
of  attitude  exemplified  by  statements  such  as  "In  his  evaluation  of 
evidence,  man  is  apparently  not  a conservative  Bayesian:  he  is  not 

Bayesian  at  all"  (138,  p.  450),  or  ".  . . man's  cognitive  capacities 
are  not  adequate  for  the  tasks  which  confront  him"  (114,  p.  4),  or 
".  . . people  systematically  violate  the  principles  of  rational  decision 
making  when  judging  probabilities,  making  predictions,  or  otherwise 
attempting  to  cope  with  probabilistic  tasks"  (252,  p.  169). 

Van  Winkle  would  be  further  surprised  to  see  Hammond  (114)  and 
Dawes  (69)  putting  information-processing  deficiencies  on  a par  with 
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motivational  conflicts  as  causes  of  the  ills  that  plague  humanity,  and 
to  see  financial  analysts,  accountants,  geographers,  statisticians  and 
others  being  briefed  on  the  implications  of  these  intellectual  short- 
comings (14,  121a,  248,  249,  255,  282). 

In  1971,  when  reviewing  the  literature  on  probabilistic  inference, 
Slovic  & Lichtenstein  (254)  found  only  a handful  of  studies  that  looked 
at  subjects'  information-processing  heuristics.  Since  then,  rather 
than  simply  comparing  behavior  with  normative  models,  almost  every  de- 
scriptive study  of  probabilistic  thinking  has  attempted  to  determine 
how  the  underlying  cognitive  processes  are  molded  by  the  interaction 
between  the  demands  of  the  task  and  the  limitations  of  the  thinker. 

Much  of  the  impetus  for  this  change  can  be  attributed  to  Tversky 
& Kahneman's  (138,  139,  284,  285,  286)  demonstrations  of  three  judg- 
mental heuristics,  representativeness,  availability  and  anchoring, 
which  determine  probabilistic  judgments  in  a variety  of  tasks.  Although 
always  efficient,  and  at  times  valid,  these  heuristics  can  lead  to 
biases  that  are  large,  persistent,  and  serious  in  their  implications 
for  decision  making. 

Judgment  by  representativeness  What  is  the  probability  that  object 
B belongs  to  class  A?  Or,  what  is  the  probability  that  process  A will 
generate  event  B?  Kahneman  & Tversky  (138)  hypothesized  that  people 
answer  such  questions  by  examining  the  essential  features  of  A and  of 
B and  assessing  the  degree  of  similarity  between  them,  the  degree  to 
which  B is  "representative"  of  A.  When  B is  very  similar  to  A,  as 
when  an  outcome  is  highly  representative  of  the  process  from  which  it 
originates,  then  its  probability  is  judged  to  be  high. 
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Several  lines  of  evidence  support  this  hypothesis.  Tver sky  & 
Kahneman  (284)  demonstrated  a belief  in  what  they  called  "the  law  of 
small  numbers"  whereby  even  small  samples  are  viewed  as  highly  repre- 
sentative of  the  populations  from  which  they  are  drawn.  This  belief 
led  their  subjects,  research  psychologists,  to  underestimate  the  error 
and  unreliability  inherent  in  small  samples  of  data.  Kahneman  & Tversky 
(138)  showed  that  both  subjective  sampling  distributions  and  posterior 
probability  estimates  were  insensitive  to  sample  size,  a normatively 
important  but  psychologically  non-representative  factor.  In  a subse- 
quent paper,  Kahneman  & Tversky  (139)  demonstrated  that  people's  intui- 
tive predictions  violate  normative  principles  in  ways  that  can  be 
attributed  to  representativeness  biases.  For  one,  representativeness 
causes  prior  probabilities  to  be  neglected.  For  another,  predictions 
tend  not  to  be  properly  regressive,  being  insensitive  to  considerations 
of  data  reliability. 

Judgment  by  availability  Other  judgmental  biases  are  due  to  use  of 
the  "availability"  heuristic  (283)  whereby  an  event  is  judged  likely 
or  frequent  if  it  is  easy  to  imagine  or  recall  relevant  instances. 

In  life,  instances  of  frequent  events  are  typically  easier  to  recall 
than  instances  of  less  frequent  events,  and  likely  occurrences  are 
usually  easier  to  imagine  than  unlikely  ones.  Thus,  availability  is 
often  a valid  cue  for  the  assessment  of  frequency  and  probability. 
However,  since  availability  is  also  affected  by  subtle  factors  unrelated 
to  likelihood,  such  as  familiarity,  recency,  and  emotional  saliency, 
reliance  on  it  may  result  in  systematic  biases. 


Judgment  by  adjustment  Another  error-prone  heuristic  is  "anchoring 


and  adjustment."  With  this  process,  a natural  starting  point  or  an- 
chor is  used  as  a first  approximation  to  the  judgment.  The  anchor 
is  then  adjusted  to  accommodate  the  implications  of  additional  infor- 
mation. Typically,  the  adjustment  is  imprecise  and  insufficient  (248). 
Tversky  & Kahneman  (286)  showed  how  anchoring  and  adjustment  could 
cause  the  overly  narrow  confidence  intervals  found  by  many  investiga- 
tors (175)  and  the  tendency  to  misjudge  the  probability  of  conjunctive 
and  disjunctive  events  (16,  57,  317). 

Related  work  Numerous  studies  have  replicated  and  extended  the  Kahneman 
& Tversky  studies,  and  others  have  independently  arrived  at  similar 
conclusions.  The  representativeness  heuristic  has  received  the  most 
attention.  Wise  & Mockovak  (310),  Bar-Hillel  (17),  and  Teigen  (278, 

279)  have  documented  the  importance  of  similarity  structures  in  proba- 
bility judgment.  Like  Kahneman  & Tversky  (138),  Marks  & Clarkson 
(191,  192)  and  Svenson  (271)  observed  that  subjects'  posterior  proba- 
bilities in  binomial  bookbag  and  poker  chip  tasks  were  predominantly 
influenced  by  the  most  representative  aspect  of  the  sample,  the  propor- 
tion of  red  chips.  Contrary  to  the  normative  model,  population  propor- 
tion and  sample  size  were  relatively  unimportant.  Leon  & Anderson  (166) 
did  find  an  influence  of  these  two  characteristics  and,  as  a result, 
claimed  that  Kahneman  & Tversky' s subjects  must  have  misunderstood  the 
task.  Ward  (302),  however,  argued  that  the  conflicting  results  were 
most  likely  due  to  differences  in  the  tasks,  rather  than  to  misinter- 
pretation of  instructions.  Hammerton  (113),  Lyon  & Slovic  (184), 
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Nisbett  & Borgida  (210),  and  Borgida  & Nisbett  have  replicated  Kahneman 
& Tversky's  finding  that  subjects  neglect  population  base  rates  when 
judging  the  probability  than  an  individual  belongs  to  a given  category. 
Nisbett  & Borgida  argued  that  this  neglect  stems  in  part  from  the  ab- 
stract, applied,  statistical  character  of  base-rate  information.  They 
found  that  concrete,  case-specific  information,  even  from  a sample  of 
one,  may  have  much  greater  importance,  a rather  dramatic  illustration 
of  the  law  of  small  numbers.  Additional  evidence  for  representativeness 
comes  from  studies  by  Brickman  & Pierce  (45),  llolzworth  & Doherty 
(123),  Bauer  (20,  21)  and  Lichtenstein,  Earle  & Slovic  (173). 

Availability  and  anchoring  have  been  studied  less  often.  Evidence 

3 

of  availability  bias  has  been  found  by  Borgida  & Nisbett  and  Slovic, 
Fischhoff  & Lichtenstein  (252).  Anchoring  has  been  hypothesized  to 
account  for  the  effects  of  response  mode  upon  bet  preferences  (176, 

177),  and  it  has  been  proposed  as  a method  that  people  use  to  reduce 
strain  when  making  ratio  judgments  (106).  Pitz  (219)  gave  the  anchoring 
heuristic  a key  role  in  his  model  describing  how  people  create  subjec- 
tive probability  distributions  for  imperfectly  known  (uncertain)  quan- 
tities . 

Overconfidence  The  evidence  presented  above  suggests  that  the  heuristic 
selected,  the  way  it  is  employed  and  the  accuracy  of  the  judgment  it 
produces  are  all  highly  problem-specific;  they  may  even  vary  with  dif- 
ferent representations  of  the  same  problem.  Indeed,  heuristics  may 
be  faulted  as  a general  theory  of  judgment  because  of  the  difficulty 


J Borgida,  E.  & Nisbett,  R.  E.  Abstract  vs.  concrete  information: 

The  senses  engulf  the  mind,  unpublished.  University  of  Michigan,  1976. 
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of  knowing  which  will  be  applied  in  any  particular  instance. 

There  is,  however,  one  fairly  valid  generalization  that  may  be 
derived  from  this  literature.  Except  for  some  Bayesian  inference 
tasks,  people  tend  to  be  overconfident  in  their  judgments.  This  may 
be  seen  in  their  non-regress ive  predictions  (139),  in  their  disregard 
for  the  extent  of  the  data  base  upon  which  their  judgments  rest  (138), 
or  its  reliability  (217),  and  in  the  miscalibrat ion  of  their  probabili- 
ties for  discrete  and  continuous  propositions  (175).  Howell  (128)  has 
repeatedly  shown  that  people  overestimate  their  own  abilities  on  tasks 
requiring  skill  (e.g.,  throwing  darts).  Langer  (163)  dubbed  this 
effect  "the  illusion  of  control"  and  demonstrated  that  it  can  be  in- 
duced by  introducing  skill  factors  (such  as  competition  and  choice) 
into  chance  situations. 

In  a task  that  had  people  estimate  the  odds  that  they  had  been 
able  to  select  the  correct  answer  to  general  knowledge  questions,  Slovic, 
Fischhoff  & Lichtenstein  (251)  found  that  wrong  answers  were  often 
given  with  certainty.  Furthermore,  subjects  had  sufficient  faith  in 
their  odds  that  they  were  willing  to  participate  in  a gambling  game 
that  punished  them  severely  for  their  overconfidence. 

How  do  we  maintain  this  overconfidence?  One  possibility  is  that 
the  environment  is  often  not  structured  to  show  our  limits.  Many  de- 
cisions we  make  are  quite  insensitive  to  errors  in  estimating  what  we 
want  (utilities)  or  what  is  going  to  happen  (probabilities)  so  that 
errors  in  estimation  are  hard  to  detect  (249a).  Sometimes  we  receive 
no  feedback  at  all.  Even  when  we  do,  we  may  distort  its  meaning  to 
exaggerate  our  judgmental  prowess,  perhaps  convincing  ourselves  that 
the  outcome  we  got  was  what  we  really  wanted.  Langer  & Roth  (164) 
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found  that  subjects  who  experienced  initial  successes  in  a repetitive 
task  overremembered  their  own  past  successes.  Fischhoff  & Beyth  (93) 
found  that  people  asked  to  recall  their  own  predictions  about  past 
events  remembered  having  assigned  higher  probabilities  to  events  that 
later  occurred  than  was  actually  the  case.  Fischhoff  (89)  also  found 
that  people  (a)  overestimate  the  extent  to  which  they  would  have  bee 
able  to  predict  past  events  had  they  been  asked  to  do  so,  and  (b)  exag- 
gerate the  extent  to  which  others  should  have  been  able  to  predict  past 
events.  These  hindsight  biases  are  further  evidence  of  overconfidence, 
for  they  show  that  people  have  inordinately  high  opinions  of  their  own 
predictive  abilities. 

Descriptive  theories  Most  of  the  research  on  heuristics  and  biases 
can  be  considered  pre-theoretical.  It  has  documented  the  descriptive 
shortcomings  of  the  normative  model  and  produced  concepts  such  as  rep- 
resentativeness and  anchoring  that  may  serve  as  the  bases  for  new  de- 
scriptive theories.  Although  theory  development  has  been  limited  thus 
far,  efforts  by  Wallsten  (300,  301)  and  Shanteau  (243,  244)  to  produce 
descriptive  algebraic  models  are  noteworthy.  Shanteau' s approach  is 
based  upon  the  averaging  model  of  Anderson's  integration  theory  (7). 
Wallsten's  model,  formulated  and  tested  within  the  framework  of  con- 
joint measurement,  assumes  that  limited  capacity  causes  people  to  pro- 
cess dimensions  of  information  sequentially  and  weight  them  differen- 
tially, according  to  their  salience. 

Choice 

In  their  introduction  to  two  volumes  on  contemporary  developments 
in  mathematical  psychology,  Krantz,  et  al.  (159)  explained  their  exclusion 
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of  the  entire  area  of  preferential  choice  as  follows: 


There  is  no  lack  whatever  of  technically  excellent  papers 
in  this  area  but  they  give  no  sense  of  any  real  cumulation 
of  knowledge.  What  are  established  laws  of  preferential 
choice  behavior?  (Since  three  of  the  editors  have  worked 
in  this  area,  our  attitude  may  reflect  some  measure  of  our 
own  frustration.)  (p.  xii) 

This  sense  of  frustration  is  understandable  when  one  reviews  re- 
cent research  on  choice.  The  field  is  in  a state  of  transition,  moving 
away  from  the  assumption  that  choice  probability  is  expressable  as  a 
monotone  function  of  the  scale  values  or  utilities  of  the  alternatives. 
Present  efforts  are  aimed  at  developing  more  detailed,  molecular  con- 
cepts that  describe  choice  in  terms  of  information-processing  phenomena. 
Researchers  appear  to  be  searching  for  heuristics  or  modes  of  processing 
information  that  are  common  to  a wide  domain  of  subjects  and  choice 
problems.  However,  they  are  finding  that  the  nature  of  the  task  is 
a prime  determinant  of  the  observed  behavior. 

ELIMINATION  BY  ASPECTS  One  major  new  choice  theory  is  Tversky's  (280, 

[-  j 

, ; 281)  elimination-by-aspects  (EBA)  model.  The  model  describes  choice 

I as  a covert  sequential  elimination  process.  Alternatives  are  viewed 

. 4 

I 

as  sets  of  aspects  (e.g.,  cars  described  by  price,  model,  color,  etc.). 
At  each  stage  in  the  choice  process,  an  aspect  is  selected  with  proba- 
bility proportional  to  its  importance;  alternatives  that  are  unsatis- 
factory on  the  selected  aspect  are  eliminated.  Tversky  showed  that  the 
EBA  model  generalizes  the  models  of  Luce  (183)  and  Restle  (228)  while 
avoiding  some  of  the  counter-examples  to  which  these  earlier  models 
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are  susceptible.  Searching  for  even  broader  applicability,  Corbin 
& Marley  (62)  proposed  a random  utility  model  that  includes  the  EBA 
model  as  a special  case.  Other  models  built  around  the  concept  of 
successive  elimination  of  alternatives  have  been  developed  by  Hogarth 
(121,  122)  and  Pollay  (220). 

PROCESS  DESCRIPTION  Most  recent  empirical  research  has  been  concerned 
with  describing  the  decision  maker's  methods  for  processing  information 
before  choosing.  Whereas  earlier  work  focused  on  external  products 
(e.g. , choice  proportions  and  rankings)  and  use  rather  simple  methods, 
process-descriptive  studies  must  employ  more  complex  procedures  for 
collecting  and  analyzing  data.  Thus,  we  find  a return  to  introspec- 
tive methods  (29,  199,  272)  in  which  subjects  are  asked  to  think  aloud 
as  they  choose  among  various  multi-attribute  alternatives.  Bettman  & 
Jacoby  (31)  and  Payne  (214)  supplemented  the  think-aloud  procedure  by 
requiring  subjects  to  seek  information  from  envelopes  on  an  "information 
board."  Russo  & Rosen  (231)  used  eye-movement  data  conjointly  with 
verbal  protocols.  One  goal  of  these  studies  is  to  represent  the  choice 
process  graphically  as  a tree  or  network  (discrimination  net)  of  suc- 
cessive decisions.  Swinth,  Gaumnitz  & Rodriguez  (275)  developed  a 
method  of  controlled  introspection  that  enables  subjects  to  build  and 
validate  their  own  discrimination  nets.  Bettman  (27)  showed  how  to 
describe  such  nets  via  graph-theoretical  concepts.  Uneasy  about  the 
subjectivity  of  introspective  techniques,  Hogarth  (121)  used  an  ingen- 
ious blend  of  theory  and  empiricism  to  develop  a computer  algorithm 
that  builds  the  tree  without  recourse  to  subjective  inputs. 
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Can  introspective  methods  be  trusted?  Nisbett  & Wilson  reopened 
an  old  debate  by  arguing  that  people  lack  awareness  of  the  factors  that 
affect  their  judgments.  After  documenting  this  claim  with  results 
from  six  experiments,  they  concluded  that  "Investigators  who  are  in- 
clined to  place  themselves  at  the  mercy  of  such  [introspective]  re- 
ports . . . would  be  better  advised  to  remain  in  the  armchair"  (p.  35). 
While  important,  this  criticism  may  be  overstated.  Students  of  choice 
have  in  many  instances  validated  their  introspective  reports  against 
theoretical  predictions  (199)  and  data  from  other  sources  (214,  Foot- 
note 5)  . 

What  do  these  methodologies  tell  us  about  choice?  First,  they 
indicate  that  subjects  use  many  rules  and  strategies  enroute  to  a de- 
cision. These  include  conjunctive,  disjunctive,  lexicographic  and 
compensatory  rules  and  the  principle  of  dominance  (274).  A typical 
choice  may  involve  several  stages,  utilizing  different  rules  at  dif- 
ferent junctures.  Early  in  the  process,  subjects  tend  to  compare  a 
number  of  alternatives  on  the  same  attribute  and  use  conjunctive  rules 
to  reject  some  alternatives  from  further  consideration  (26,  214,  245, 
272).  Later,  they  appear  to  employ  compensatory  weighting  of  advan- 
tages and  disadvantages  on  the  reduced  set  of  alternatives  (214). 

i 

Features  of  the  task  that  complicate  the  decision,  such  as  incomplete 
data,  incommensurable  data  dimensions,  information  overload,  time  pres- 
sures and  many  alternatives  seem  to  encourage  strain-reducing,  non- 
compensatory strategies  (214,  255,  313,  314).  Svenson  (272)  and  Russo 
& Rosen  (231)  found  subjects  reducing  memory  load  by  comparing  two 

^ Nisbett,  R.  E.  & Wilson,  T.  D.  Awareness  of  factors  influencing 
one's  own  evaluations,  judgments,  and  behavior,  unpublished.  University 
of  Michigan,  1976. 
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alternatives  at  a time  and  retaining  only  the  better  one  for  later 
comparisons.  Russo  & Dosher^  observed  simple  strategies,  such  as 
counting  the  number  of  dimensions  favoring  each  alternative  or  ig- 
noring small  differences  between  alternatives  on  a particular  dimen- 
sion. In  some  instances,  these  strategies  led  to  sub-optimal  choices. 

In  general,  people  appear  to  prefer  strategies  that  are  easy  to 
justify  and  don't  involve  reliance  on  relative  weights,  trade-off 
functions  or  other  numerical  computations.  One  implication  of  this 
was  noted  by  Slovic  (250),  whose  subjects  were  forced  to  choose  among 
pairs  of  alternatives  that  were  equal  in  value  for  them.  Rather  than 
choose  randomly,  subjects  consistently  followed  the  easy  and  defensible 
strategy  of  selecting  the  alternative  that  was  superior  on  the  more 
important  dimension. 

SCRIPT  PROCESSING  Abelson's  (1)  new  approach  to  explaining  decisions 
warrants  further  study.  It  is  based  on  the  concept  of  a "cognitive 
script,"  which  is  a coherent  sequence  of  events  expected  by  the  indi- 
vidual on  the  basis  of  prior  learning  or  experience.  When  faced  with 
a decision,  individuals  are  hypothesized  to  bring  relevant  scripts 
into  pldy.  For  example.  Candidate  Y's  application  for  graduate  school 
may  be  rejected  because  Y reminds  the  decision  maker  of  Candidate  X, 
who  was  accepted  and  failed  miserably.  Another  script  might  assimilate 
the  candidate  into  a category  (He's  one  of  those  shy  types  who  does 
well  in  courses,  but  doesn't  have  enough  initiative  in  research). 


3 Russo,  J.  E.  & Dosher,  B.  A.  Dimensional  evaluation:  A heuristic 
for  binary  choice,  unpublished.  University  of  California,  Santa  Bar- 
bara, 19  75. 
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Script  theory,  though  still  in  a highly  speculative  stage,  suggests 
a type  of  explanation  for  choice  that  has  thus  far  been  overlooked. 

CONSUMER  CHOICE  Much  research  on  choice  has  been  done  within  the 
domain  of  consumer  psychology.  Comprehensive  reviews  of  this  research 
have  been  provided  by  Jacoby  (134,  135).  Although  some  of  this  work 
is  application  of  multiple  regression,  conjoint  measurement,  and  anal- 
ysis of  variance  to  describe  consumers'  values  (30,  107,  312),  many 
other  studies  have  investigated  basic  psychological  questions.  For 
example,  one  major  issue  has  been  the  effect  of  amount  and  display  of 
information  on  the  optimality  of  choice.  Jacoby  and  his  colleagues 

have  argued  that  more  information  is  not  necessarily  helpful,  as  it 

l 

can  overload  consumers  and  lead  them  to  select  sub-optimal  products. 
Russo,  Krieser  & Miyashita  (230)  observed  that  subjects  had  great  dif- 
ficulty finding  the  most  economical  product  among  an  array  of  different 
prices  and  packages.  Even  unit  prices,  which  do  the  arithmetic  for 
the  consumer,  had  little  effect  on  buyer  behavior  when  posted  on  the 
shelf  below  each  product.  However,  when  prices  per  unit  were  listed  in 
order  from  high  to  low  cost,  shoppers  began  to  buy  less  expensive 
products . 

Models  of  Risky  Choice 

Decision  making  under  conditions  of  risk  has  been  studied  exten- 
sively. This  is  probably  due  to  the  availability  of  (a)  an  appealing 
research  paradigm,  choices  among  gambles,  and  (b)  a dominant  normative 
theory,  the  subjectively  expected  utility  (SEU)  model,  against  which 
behavior  can  be  compared.  The  SEU  model  assumes  that  people  behave 
as  though  they  maximized  the  sum  of  the  products  of  utility  and  proba- 
bility. 
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Early  studies  of  the  model's  descriptive  adequacy  produced  con- 
flicting results.  Situational  and  task  parameters  were  found  to  have 
strong  effects,  leading  Rapoport  & Wallsten  (226)  to  observe  that  a 
researcher  might  accept  SEU  with  one  set  of  bets  and  reject  it  with 
another,  differently  structured  set.  Proponents  of  the  SEU  model 
point  out  that  it  gives  a good  global  fit  to  choice  data,  particularly 
for  simple  gambles. ^ In  addition,  certain  assumptions  of  the  model, 
like  the  independent  (multiplicative)  combination  of  probabilities 
and  payoffs,  have  been  verified  for  simple  gambles  (244,  299). 

However,  during  the  past  five  years,  the  proponents  of  SEU  have 
been  greatly  outnumbered  by  its  critics.  Coombs  (60)  has  argued  that 
risky  choice  is  determined  not  by  SEU,  but  by  a compromise  between 
maximization  of  expected  value  (EV)  and  optimization  of  risk.  He  pro- 
posed an  alternative  to  SEU,  "portfolio  theory,"  in  which  risk  prefer- 
ences play  a central  role.  That  role  is  illustrated  in  a study  by 
Coombs  & Huang  (61)  in  which  a gamble,  B,  was  constructed  as  a proba- 
bility mixture  of  two  other  gambles,  A and  C.  Many  subjects  preferred 
gamble  B (with  its  intermediate  risk  level)  to  gambles  A and  C,  thus 
violating  a fundamental  axiom  of  SEU  theory. 

Zagorsky  (318)  demonstrated  a result  that  appears  to  violate  SEU 
and  many  other  algebraic  models  as  well.  Zagorski's  subjects  were 
shown  pairs  of  gambles  (A,  B)  and  were  asked  to  judge  the  amount  of 
money  (A-B)  that  would  induce  them  to  trade  the  better  gamble  (A)  for 
the  worse  gamble  (B) . He  demonstrated  that  one  can  construct  quadruples 


Goodman,  B.,  Saltzman,  M. , Edwards,  W.,  & Krantz,  D.,  Predictions  of 
bids  for  two-outcome  gambles  in  a casino  setting,  unpublished,  1976. 


17 


r 


■ 


of  gambles  A,  B,  C and  D such  that 

(A-B)  + (B-C)  + (A-D)  + (D-C). 

In  other  words,  path  independence  is  violated.  The  difference  between 
gambles  A and  C depends  on  whether  the  intermediate  gamble  is  B or  D. 

A favorite  approach  of  SEU  critics  is  to  develop  counterexamples 
to  the  fundamental  axioms  of  the  theory.  The  paradoxes  of  Allais  (A) 
and  Ellsberg  (85)  are  two  of  the  most  famous,  both  designed  to  invali- 
date Savage's  (232)  independence  principle.  Until  recently,  few  theo- 
rists were  convinced.  MacCrimmon  (185)  showed  that  business  executives 
who  violated  various  axioms  could  easily  be  led,  via  discussion,  to 
see  the  error  of  their  ways.  However,  Slovic  & Tversky  (256)  chal- 
lenged MacCrimmon 's  discussion  procedure  on  the  grounds  that  it  pres- 
sured the  subjects  to  accept  the  axioms.  They  presented  subjects  with 
arguments  for  and  against  the  independence  axiom  and  found  persistent 
violations,  even  after  the  axiom  was  presented  in  a clear  and  presumably 
compelling  fashion.  Moskowitz  (200)  used  a variety  of  problem  repre- 
sentations (matrix  formats,  trees,  and  verbal  presentations)  to  clarify 
the  principle  and  maximize  its  acceptability,  yet  still  found  that  the 
independence  axiom  was  rejected.  Even  MacCrimmon' s faith  in  many  of 
the  key  axioms  has  been  shaken  by  recent  data  (see  188) , leading  him 
to  suggest  that  reevaluation  of  the  theory  is  in  order. 

Kahneman  & Tversky  (1A0,  283)  attempted  this  sort  of  reevaluation, 
presenting  evidence  for  two  pervasive  violations  of  SEU  theory.  One, 
the  "certainty  effect,"  causes  consequences  that  are  obtained  with 
certainty  to  be  valued  more  than  uncertain  consequences.  The  Allais 
paradox  may  be  due  to  this  effect.  The  second,  labeled  the  "reference 
effect,"  leads  people  to  evaluate  alternatives  relative  to  a reference 
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point  corresponding  to  their  status  quo,  adaptation  level,  or  expec- 
tation. By  altering  the  reference  point,  formally  equivalent  versions 
of  the  same  decision  problem  may  elicit  different  preferences.  These 
effects  pose  serious  problems  for  the  normative  theory  and  its  appli- 
cation. 

Payne  (213)  proposed  replacing  the  SEU  model  with  information 
processing  theories  that  describe  how  probabilities  and  payoffs  are 
integrated  into  decisions.  He  presented  a "contingent  process"  model 
to  describe  the  sequential  processes  involved  in  choice  among  gambles. 

For  support,  he  cited  a number  of  display  and  response-mode  effects 
that  are  due  to  processing  difficulties  (176,  177,  179,  215).  Koziel- 
ecki's  (158)  discussion  of  the  internal  representation  of  risky  tasks 
carried  a similar  message. 

Kunreuther  (160)  has  argued  that  utility  theory  would  be  of  little 
value  to  a policy  maker  trying  to  predict  how  people  would  respond  to 
various  flood  or  earthquake  insurance  programs.  First,  the  theory 
makes  predictions  that  are  not  borne  out  by  actual  behavior — for  example, 
that  people  will  prefer  policies  with  high  deductibles  or  that  subsi- 
dizing premiums  will  increase  insurance  purchasing.  Second,  it  gives 
no  guidance  about  the  social,  situational  and  cognitive  factors  that 
are  likely  to  influence  insurance  purchase.  Like  Payne,  Kunreuther 
called  for  an  alternative  theory,  founded  on  the  psychology  of  human 
information  processing,  and  presented  a model  of  his  own  to  support 
his  case. 

Readers  interested  in  additional  attacks  on  the  staggering  SEU 

' 

model  should  consult  Barron  & Mackenzie  (19),  Davenport  & Middleton 
(66),  Fryback,  Goodman  & Edwards  (99),  Ronen  (229),  and  Svenson  (273). 


Regression  Approaches 


The  regression  paradigm  uses  analysis  of  variance,  conjoint 
measurement  and  multiple  regression  techniques  to  develop  algebraic 
models  that  describe  the  method  by  which  individuals  weight  and  com- 
bine information. 

INTEGRATION  THEORY  Working  within  the  framework  of  "information  inte- 
gration theory,"  Anderson  and  his  colleagues  have  shown  that  simple 
algebraic  models  describe  information  use  quite  well  in  an  impressive 
variety  of  judgmental,  decision  making,  attitudinal,  and  perceptual 
tasks  (6,  7).  These  models  typically  have  revealed  stimulus  averaging, 
although  some  subtracting  and  multiplying  has  been  observed.  Partic- 
ularly relevant  to  decision  making  are  studies  of  risk  taking  and  in- 
ference (244),  conf igurality  in  clinical  judgment  (5),  intuitive  sta- 
tistics (167,  168),  preference  for  bus  transportation  (210a),  and 
judgment  in  stud  poker  (181).  There  is  no  doubt  that  algebraic  models 
derived  from  Anderson's  techniques  provide  good  surface  descriptions 
of  judgmental  processes.  However,  as  Graesser  & Anderson  (106)  have 
observed,  establishment  of  an  algebraic  model  is  only  the  first  step 
toward  disclosing  the  underlying  cognitive  mechanisms,  which  may  be 
rather  different  from  the  surface  form  of  the  model. 

POLICY  CAPTURING  Another  form  of  the  regression  paradigm  uses  corre- 
lational statistics  to  provide  judgmental  models  in  realistic  settings. 
The  most  systematic  development  of  these  procedures  has  been  made  by 
Hammond  and  his  colleagues  (117)  within  "social  judgment  theory."  This 
theory  assumes  that  most  judgments  depend  upon  a mode  of  thought  that 
is  quasi-rational,  that  is,  a synthesis  of  analytic  and  intuitive  pro- 
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cesses.  The  elements  of  quasi-rational  thought  are  cues  (attributes), 
their  weights,  and  their  functional  relationships  (linear  and  non- 
linear) to  both  the  environment  and  the  judge's  responses.  Brunswik's 
lens  model  and  multiple  regression  analysis  are  used  to  derive  equa- 
tions representing  the  judge's  cue  utilization  policy.  Judgmental 
performance  is  analyzed  into  knowledge  and  "cognitive  control,"  the 
latter  being  the  ability  to  employ  one's  knowledge  consistently  (118). 

By  1971,  it  was  evident  that  linear  models  could  describe  college 
students'  cue-weighting  policies  in  a wide  variety  of  laboratory  tasks 
(254).  During  the  past  five  years,  such  models  have  been  used  with 
similar  success  to  analyze  complex  real-world  judgments.  Judges  in 
these  studies  have  included  business  managers  (119,  193,  201,  202), 
graduate  admissions  committees  (68,  237),  auditors,  accountants,  and 
loan  officers  (13,  172,  315),  military  officers  (277),  literary  critics 
(84),  and  trout  hatchery  employees  (184),  as  they  attempted  to  predict 
business  failures  and  stock  market  performance,  select  graduate  stu- 
dents, plan  work  force  and  production  schedules,  evaluate  accounting 
procedures.  Air  Force  cadets,  and  theatrical  plays,  and  recommend  trout 
streams.  Even  U.S.  senators  have  been  modeled  and  their  roll-call  votes 
predicted  (298).  As  in  the  laboratory  studies,  linear  equations  have 
predicted  these  complex  judgments  quite  accurately.  The  coefficients 
of  these  equations  have  provided  useful  descriptions  of  the  judges' 
cue-weighting  policies  and  have  pinpointed  the  sources  of  inter-judge 
disagreement  and  non-optimal  cue  use. 

While  policies  were  being  captured  in  the  field,  other  researchers 
were  deepening  our  understanding  of  the  models.  Dawes  & Corrigan  (70) 
observed  that  linear  models  have  typically  been  applied  in  situations 
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in  which  (a)  the  predictor  variables  are  monotonically  related  to 
the  criterion  (or  can  be  easily  rescaled  to  be  monotonic),  and  (b) 
there  is  error  in  the  independent  and  dependent  variables.  They 
demonstrated  that  these  conditions  insure  good  fits  by  linear  models, 
regardless  of  whether  the  weights  in  such  models  are  optimal.  Thus, 
the  linearity  observed  in  judges'  behaviors  may  be  reflecting  only 
a characteristic  of  linear  models,  not  a characteristic  of  human  judg- 
ment. 

In  other  work,  theoretical  and  methodological  refinements  of  the 
lens  model  have  been  developed  by  Castellan  (52,  53)  and  Stenson  (267). 
Cook  (59)  and  Stewart  & Carter  (268)  have  worked  toward  developing 
interactive  computer  programs  for  policy  capturing.  Mertz  & Doherty 
(195)  and  Brehmer  (37)  examined  the  influence  of  various  task  charac- 
teristics on  the  conf igurality  and  consistency  of  policies.  Miller 
(197)  demonstrated  that  improper  cue  labels  could  mislead  judges  de- 
spite the  availability  of  adequate  statistical  information  about  cue 
validities.  I.ichtenstein,  Earle  & Slovic  (173)  and  Birnbaum  (32) 
showed  that  even  though  regression  equations  can  be  used  to  describe 
cue-combination  policies,  subjects  often  average  cues,  in  violation 
of  the  additivity  inherent  in  the  equations.  Wiggins  (306)  discussed 
the  problems  of  identifying  and  characterizing  individual  differences 
in  judgmental  policies,  and  Ramanaiah  & Goldberg  (222)  explored  the 
stability  and  correlates  of  Viuch  differences.  McCann,  Miller  & Mosko- 
witz  (193)  examined  the  problems  of  capturing  policies  in  particularly 
complex  and  dynamic  tasks  such  as  production  planning. 

MULTIPLE  CUE  PROBABILITY  LEARNING  Considerable  effort  has  been  invested 
in  studying  how  people  learn  to  make  inferences  from  several  probabil- 
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istic  cues.  Most  of  this  work  goes  under  the  label  "multiple-cue 
probability  learning"  (MCPL)  and  relies  on  the  lens  model  for  concep- 
tual and  analytic  guidance.  Typically,  the  cues  are  numerical  and  vary 
in  their  importance  and  in  the  form  (linear  or  nonlinear)  of  their 
relationship  to  the  criterion  being  judged.  The  criterion  usually 
contains  error,  making  perfect  prediction  impossible.  Because  these 
tasks  embody  the  essential  features  of  diagnostic  inference,  thev  are 
studied  for  their  potential  applied  significance  as  well  as  their  con- 
tributions to  basic  knowledge. 

Slovic  & Lichtenstein  (254)  reviewed  MCPL  studies  published  prior 
to  1971.  They  concluded  that:  (a)  subjects  can  learn  to  use  linear 

cues  appropriately;  (b)  learning  of  nonlinear  functions  is  slow  and 
especially  difficult  when  subjects  are  not  forewarned  that  relations 
may  be  nonlinear;  (c)  subjects  are  inconsistent,  particularly  when 
task  predictability  is  low;  (d)  subjects  fail  to  take  proper  account 
of  cue  intercorrelations ; and  (e)  outcome  feedback  is  not  very  helpful. 

Research  during  the  past  half  decade  has  confirmed  and  extended 
these  conclusions.  Difficulties  people  have  in  coping  with  intercor- 
related  cues  have  been  documented  in  numerous  studies  (8,  9,  178,  236). 
Hammond  and  his  colleagues  (115)  used  the  MCPL  paradigm  to  analyze  the 
effects  of  psychoactive  drugs  on  cognition.  They  found  that  some  drugs 
that  are  used  to  enhance  emotional  control  interfered  with  learning 
and  communication  in  ways  that  may  be  detrimental  to  therapy.  Bjorkman 
(33)  and  Castellan  (54)  reviewed  results  from  studies  using  nonmetric 
cues  and  criteria. 

Other  research  has  worked  toward  developing  a theory  to  explain 
MCPL  results  in  terms  of  erroneous  intuitions  about  probabilistic  tasks. 
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the  manner  in  which  individuals  acquire  and  test  hypotheses,  and  their 
cognitive  limitations.  For  example,  Brehmer  (38,  40,  41)  has  studied 
how  subjects  formulate  and  test  hypotheses  as  they  search  for  rules 
that  will  produce  satisfactory  inferences.  Hypotheses  about  the  func- 
tional rule  relating  cues  and  criterion  appear  to  be  sampled  from  a 
hierarchical  set  based  on  previous  experience  and  dominated  by  the 
positive  linear  rule.  Testing  of  hypotheses  about  rules  shows  inade- 
quate appreciation  of  the  probabilistic  nature  of  the  task.  Subjects 
keep  searching  for  deterministic  rules  that  will  account  for  the  ran- 
domness in  the  task;  since  there  are  none,  they  change  rules  frequently 
(i.e.,  become  inconsistent)  and  eventually  resample  rules  they  had 
previously  discarded. 

Even  when  subjects  are  informed  of  the  correct  rules,  they  have 
trouble  applying  them  consistently  (31,  42,  118).  Nonlinear  rules 
are  particularly  hard  to  apply.  Brehmer,  Hammond  and  their  colleagues 
have  thus  conceptualized  inference  as  a skill  analogous  to  motor  be- 
havior: with  both,  we  can  know  what  we  want  to  do  without  necessarily 

being  able  to  do  it. 

Dynamic  Decision  Making 

At  the  time  of  Rapoport  & Wallsten's  review,  one  active  research 
area  was  dynamic  decision  making  (DDM),  the  study  of  tasks  in  which 
"decisions  are  made  sequentially  in  time;  the  task  specifications  may 
change  over  time,  either  independently  or  as  a result  of  previous  de- 
cisions; information  available  for  later  decisions  may  be  contingent 
upon  the  outcomes  of  earlier  decisions;  and  implications  of  any  deci- 
sion may  reach  into  the  future"  (224,  p.  345).  The  present  half-decade 
began  promisingly  with  Rapoport  & Burkheimer's  (225)  explication  of 
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formal  models  for  deferred  decision  making  and  the  manner  in  which 
they  might  be  utilized  in  psychological  experiments.  Shortly  there- 
after, Ebert  (77)  reported  finding  no  difference  between  stochastic 
and  deterministic  versions  of  a task  which  Rapoport  (223)  earlier  had 
found  to  differ.  After  that,  relative  silence. 

Several  possible  reasons  for  this  decline  in  interest  come  to 
mind.  The  mathematical  sophistication  of  DDM  may  deter  some  researchers, 
as  may  the  on-line  computer  and  long  start-up  time  often  required. 
Furthermore,  DDM  models  are  so  complex  and  require  so  many  assumptions 
that  the  interpretation  of  experimental  results  is  typically  ambiguous — 
witness  the  morass  of  explanations  facing  Ebert  (77)  for  why  his  experi- 
ment and  Rapoport' s produced  different  results.  Kleiter  (151)  noted 
particular  problems  with  creating  cover  stories  that  induce  subjects 
to  accept  the  assumptions  underlying  the  model  and  with  ascertaining 
that  subjects  understood  the  task.  He  also  questioned  "the  metahy- 
pothesis that  human  behavior  is  optimal"  (p.  374),  which  limits  psy- 
chological theories  to  variations  on  the  optimal  model  (e.g.,  using 
subjective  probability  estimates  rather  than  "objective"  relative 
frequencies  or  assuming  a reduced  planning  horizon).  In  his  own  work, 
Kleiter  (152)  has  assessed  people's  planning  horizons  and  has  used  a 
non-normative  variance-preference  model  to  predict  betting  behavior 
in  a multi-stage  game  (154) . These  predictions  relied  on  the  assump- 
tion that  people  were  perfect  Bayesian  information  processors. 

A more  active  area  of  DDM  research  deals  with  sequential  infor- 
mation purchasing  or  sampling.  Levine  & Samet  (169)  allowed  subjects 
to  purchase  information  from  three  fallible  sources  until  they  could 
decide  which  of  eight  possible  targets  was  the  object  of  an  enemy 
advance.  They  found  that  information-seeking  decreased  with  conflicting 
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and  unreliable  information,  as  did  accuracy.  On  the  other  hand. 

Snapper  & Peterson  (259)  reduced  the  diagnositicity  of  information 
and  found  people  purchasing  more.  Their  subjects  appeared  to  be  un- 
responsive to  changes  in  information  quality  because  of  a policy  of 
purchasing  "intermediate"  amounts  of  information. 

Another  sequential  task  that  has  attracted  some  attention  is 
optional  stopping:  the  decision  maker  must  choose  between  accepting 

a currently  available  outcome  versus  sampling  further  outcomes  that 
may  be  of  greater  or  lesser  worth.  Although  earlier  research  (see 
225a)  found  that  subjects  performed  well  when  options  were  generated 
by  a random  but  stationary  process,  Brickman  (44)  found  very  poor  per- 
formance with  options  that  tended  to  increase  or  decrease  in  value. 

In  particular,  subjects  persisted  much  longer  in  sampling  options  with 
a descending  than  with  an  ascending  sequence.  Brickman  likened  this 
behavior  to  "throwing  good  money  after  bad."  His  subjects'  "take-the- 
money-and-run"  strategy  with  ascending  series  was  similar  to  that  found 
by  Corbin,  Olson  & Abbondanza  (63).  Their  subjects  seem  to  have  called 
it  quits  as  soon  as  an  option  appeared  that  was  a good  bit  better  than 
its  predecessors.  Olander  (212)  too,  described  satisfying  (rather 
than  maximizing)  principles  that  may  guide  subjects'  decisions  about 
searching  further. 

Are  Important  Decisions  Biased? 

A coherent  picture  emerges  from  research  described  so  far.  Be- 
cause of  limited  information-processing  capacity  and  ignorance  of  the 
rules  for  optimal  information  processing  and  decision  making,  people's 
judgments  are  subject  to  systematic  biases.  Can  these  results  be  gen- 
eralized from  the  lab  to  the  real  world? 


A number  of  critics  are  doubtful.  Edwards  (80)  argued  that  ex- 


perimenters, by  denying  subjects'  the  necessary  tools  and  providing 
neither  the  time  nor  the  guidance  to  find  them,  have  exaggerated  human 
intellectual  limitations.  Winkler  & Murphy  (309)  criticized  laboratory 
experiments  for  being  overly  simplified  and  too  well  structured  when 
compared  with  the  real-world  situations  they  are  meant  to  model.  They 
' suggested  that  people  may  perform  poorly  in  the  lab  because  of  improper 

generalization  from  their  real-world  experiences.  For  example,  because 
real-world  information  tends  to  be  redundant  and  unreliable,  people 
may  naturally  devalue  the  reliable  information  provided  in  experiments, 
producing  conservatism.  In  addition,  experimental  subjects  may  be 
poorly  motivated  and  forced  to  deal  with  unfamiliar  tasks  and  substan- 
tive areas,  without  adequate  training — even  in  the  meaning  of  the  response 
mode  (121a)  . 

In  rebuttal,  one  could  argue  that  laboratory  studies  may  show 
subjects  at  their  best.  Use  of  unfamiliar  substantive  topics  may  free 
them  from  preconceived  notions  that  could  prejudice  their  judgments. 

Provision  of  all  information  necessary  for  an  optimal  decision  (and 
little  else)  is,  as  noted  by  Winkler  & Murphy  (309),  a boon  seldom  of- 
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fered  by  the  real  world.  It  may  create  demand  characteristics  forcing 

I 

* subjects  toward  optimal  responses  (90,  97,  302).  An  alternative  re- 

buttal is  that  there  are  many  real-life  situations  which  are  quite 
like  the  laboratory,  forcing  people  to  make  a decision  without  the 
benefit  of  training  and  experience.  People  typically  buy  cars  and 
houses  and  decide  to  marry  and  divorce  under  such  circumstances,  func- 
tioning as  their  own  best  approximation  to  experts. 

Perhaps  the  best  way  to  resolve  this  argument  is  to  look  at  the 

I 

evidence . 

» 
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EXPERTS  IN  THE  LABORATORY  The  robustness  of  biases  is  shown  in  formal 
experiments  using  experts  as  subjects.  As  examples:  Tversky  & Kahne- 

man's  (284)  "law  of  small  numbers"  results  were  obtained  with  statis- 
tically savvy  psychologists.  Las  Vegas  casino  patrons  showed  the  same 
irrational  reversals  of  preferences  for  gambles  as  did  college  students 
(176,  177).  Bankers  and  stock  market  experts  predicting  closing  prices 
for  selected  stocks  showed  substantial  overconfidence  and  performed 
so  poorly  that  they  would  have  done  better  with  a "know-nothing"  strat- 
egy (265).  Lichtenstein  & Fischhoff  (174)  found  that  the  probability 
assessments  of  psychology  graduate  students  were  no  better  for  questions 
within  their  area  of  expertise  than  for  questions  relating  to  general 
knowledge . 

The  "experts"  in  these  studies  were  selected  on  the  basis  of  what 
they  knew  about  the  subject  area,  not  what  they  knew  about  judgment 
and  decision  making  (i.e.,  they  were  substantive  rather  than  normative 
experts).  Can  normative  experts  be  created  in  the  laboratory  by  proper 
training?  The  evidence  is  mixed,  suggesting  either  that  some  biases 
are  robust  or  that  we  have  failed  to  understand  the  psychology  of  our 
subjects  well  enough  to  assest  them. 

OUT  IN  THE  FIELD  With  the  exception  of  some  well-calibrated  weather 
forecasters  (described  below),  similar  biases  have  been  found  in  a 
variety  of  field  studies.  For  example.  Brown,  Kahr  & Peterson  (49) 
observed  overestimation  in  the  probability  assessments  of  military 
intelligence  analysts.  Kidd  (149)  found  that  engineers  for  the  United 
Kingdom's  Central  Electricity  Generating  Board  consistently  underesti- 
mated repair  time  for  inoperative  units.  Bond  (34)  observed  suboptimal 
play  among  53  blackjack  players  at  four  South  Lake  Tahoe  casinos. 

"By  wagering  small  bets  in  a sub-fair  game,  [these]  blackjack  gamblers 
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practically  guaranteed  loss  of  their  betting  capital  to  the  casinos" 

(p.  413).  Flood-plain  residents  misperceive  the  probability  of  floods 
in  ways  readily  explained  in  terms  of  availability  and  representative- 
ness (253).  Surveying  research  published  in  psychological  and  educa- 
tional journals,  Cohen  (56)  and  Brewer  & Owen  (43)  found  that  investi- 
gators regularly  design  experiments  with  inadequate  statistical  power, 
reflecting  a belief  in  the  "law  of  small  numbers"  (284).  Misinterpre- 
tation of  regression  toward  the  mean  appears  to  be  as  endemic  to  some 
areas  of  psychology  (101)  as  to  Kahneman  & Tversky's  (139)  subjects. 

A major  legal  debate  concerns  the  incarceration  of  individuals 
for  being  "dangerous."  What  little  evidence  there  is  regarding  the 
validity  of  dangerousness  judgments  indicates  substantial  "over-pre- 
diction," incarceration  of  people  who  would  not  have  misbehaved  had 
they  been  set  free  (72,  242).  Although  this  bias  may  reflect  a greater 
aversion  to  freeing  someone  who  causes  trouble  than  to  erring  in  the 
other  direction,  some  observers  have  attributed  it  to  judgmental  prob- 
lems such  as  failure  to  consider  base  rates,  ignorance  of  the  problems 
of  predicting  rare  events,  perception  of  non-existent  correlations. 


and  insensitivity  to  the  reliability  of  evidence  (198a). 

Jurors  appear  to  have  great  difficulty  ignoring  first  impressions 


of  the  accused's  personality,  pretrial  publicity,  and  other  forms  of 
inadmissible  evidence  (46,  270),  tendencies  which  may  represent  both 


hindsight  and  anchoring  biases (92).  The  vagaries  of  eyewitness  testi- 
mony and  witnesses'  overconfidence  in  erroneous  knowledge  are  quite 
well  known  (51,  180). 

Zieve  (319)  has  described  at  length  the  misinterpretation  and 


abuse  of  laboratory  test  results  by  medical  clinicians.  Although  some 


29 


of  these  errors  are  due  to  ignorance,  others  reflect  naive  statistical 
reasoning.  A classic  case  of  the  "law  of  small  numbers"  is  Berkson, 
Magath  & Hum's  (25)  discovery  that  aspiring  lab  technicians  were  ex- 
pected by  their  instructors  to  show  greater  accuracy  in  performing 
blood  cell  counts  than  was  possible  given  sampling  variation.  These 
instructors  would  marvel  that  the  best  students  (those  who  would  not 
cheat)  had  the  greatest  difficulty  in  producing  acceptable  counts. 

In  a phenomenological  study  of  orthopedic  surgeons,  Knafl  & Burkett 
(155)  found  a variety  of  simplifying  heuristics,  some  of  them  in  the 
form  of  general  treatment  philosophies  (e.g.,  "don't  cut  unless  you 
absolutely  have  to"). 

The  immense  decisions  facing  our  society  (e.g.,  nuclear  power) 
have  prompted  the  development  of  formal  analytic  techniques  to  replace 
traditional,  error-prone,  "seat-of-the-pants"  decision  making.  Fisch- 
hoff  (91)  reviewed  a variety  of  cost-benefit  analyses  and  risk  assess- 
ments performed  with  these  techniques  and  found  them  liable  to  omissions 
of  important  consequences  reflecting  availability  biases.  In  case 
studies  of  policy  analyses,  Albert  Wohlstetter  (311)  found  that  American 
intelligence  analysts  consistently  underestimated  Soviet  missile 
strength,  a bias  possibly  due  to  anchoring.  Roberta  Wohlstetter ' s 
(311a)  study  of  American  unpreparedness  at  Pearl  Harbor  found  the 
U.S.  Congress  and  military  investigators  guilty  of  hindsight  bias 
in  their  judgment  of  the  Pearl  Harbor  command  staff's  negligence. 

Even  if  policy  analyses  are  performed  correctly,  they  still  must 
be  explained  (sold?)  to  the  public.  In  the  area  of  natural  hazard 
management,  well-founded  government  policies  have  foundered  because 
people  do  not  perceive  flood  hazards  the  way  policy  makers  expect  them 
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to  (253).  For  example,  the  National  Flood  Insurance  Program  has  had 
only  limited  success  because  the  endangered  people  will  not  buy  the 
highly  subsidized  and  normatively  very  attractive  insurance  offered 
them  (160). 

THE  ULTIMATE  TEST  "If  behavioral  decision  theory  researchers  are  so 
smart,  why  aren't  they  rich?" 

"They're  not  in  business." 

"Then  why  aren't  people  who  are  in  business  falling  over  themselves 
to  utilize  their  results?" 

Well,  although  psychological  research  has  not  swept  the  world's 
decision  makers  like  wildfire,  it  has  kindled  some  non-negligible  in- 
terest. The  concern  weather  forecasters  and  decision  analysts  have 
shown  for  research  in  probability  assessment  is  described  elsewhere 
in  this  review.  The  Department  of  Defense  is  developing  sophisticated 
decision  aids  to  relieve  military  commanders  of  the  need  to  integrate 
information  in  their  heads  (148).  U.  S.  intelligence  analysts  have 
shown  interest  in  the  use  of  Bayesian  approaches  for  processing  of 
intelligence  information  (79a,  147).  Researchers  in  accounting^  (see 
also  14)  have  advocated  considering  information-processing  limits  in 
designing  financial  reports.  The  American  College  of  Radiology  has 
launched  a massive  "Efficacy  Study”  to  see  how  radiologists  use  the 
probabilistic  information  from  x-rays.  Bettman  (29),  Armstrong,  Ken- 
dall & Ross  (11)  and  ochers  have  argued  that  legislation  intended  to 
provide  consumers  with  necessary  information  (e.g.,  unit  pricing,  true 


Climo,  T.  A.  Cash  flow  statements  for  investors,  unpublished.  Uni- 
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31 


interest  rates)  must  consider  how  those  consumers  do,  in  fact,  process 
information. 

DECISION  AIDS 

"What  do  you  do  for  a living?" 

"Study  decision  making." 

"Then  you  can  help  me.  I have' some  big  decisions  to  make." 

"Well,  actually  ..." 

That  sinking  feeling  of  inadequacy  experienced  by  many  of  us  doing 
psychological  research  in  decision  making  is  probably  not  felt  by  most 

V 

experts  in  decision  analysis,  multi-attribute  utility  theory  or  other 
decision  aiding  techniques.  Proponents  of  these  approaches  have  reme- 
dies for  what  ails  you — techniques  to  help  users  make  better  decisions 
in  any  and  all  circumstances. 


Most  of  these  decision  aids  rely  on  the  principle  of  divide  and 


conquer.  This  "decomposition"  approach  is  a constructive  response  to 
the  problem  of  cognitive  overload.  The  decision  aid  fractionates  the 
total  problem  into  a series  of  structurally-related  parts,  and  the  de- 


uations,  people  may  really  know  what  they  want  to  do  better  than  they 
know  how  to  assess  the  inputs  required  for  the  decision  aid. 

Decision  aids  which  do  not  rely  on  decomposition,  but  instead  re- 
quire the  decision  maker  to  state  preferences  among  whole,  nonfractionated 
alternatives,  are  here  called  "wholistic."  The  models  in  these  aids 
are  used  to  smooth  or  correct  the  wholistic  judgments,  and  to  partial 
them  into  components. 

Since  several  of  the  decision  aids  rely  on  assessments  of  proba- 
bility, we  start  this  section  with  a review  of  probability  elicitation 
techniques . 

Assessing  Probabilities 

What  is  the  best  way  to  assess  probabilities?  Spetzler  & Staiel 
von  Holstein  (260)  have  written  an  excellent  description  of  how  the 
Decision  Analysis  Group  at  Stanford  Research  Institute  approaches  this 
problem.  They  recommended  (a)  carefully  structuring  the  problem  with 
the  client  ("mental  acrobatics  should  be  minimized,"  p.  343),  (b) 
minimizing  biases  that  might  affect  the  assessor,  (c)  using  personal 
interviews  rather  than  computer-interactive  techniques  with  new  clients, 
and  (d)  using  several  different  elicitation  methods,  both  direct  and 
indirect.  Their  favorite  elicitation  technique  is  a reference  bet  in- 
volving a "probability  wheel,"  a disk  with  two  differently  colored  ^ 
sectors  whose  relative  size  is  adjustable.  The  assessor  is  offered 
two  bets,  each  with  the  same  payoff.  One  bet  concerns  the  uncertain 
quantity  (you  win  if  next  year's  sales  exceed  $X) ; the  other  bet  con- 
cerns the  disk  (you  win  if  the  pointer  lands  in  the  orange  sector 
after  the  disk  is  spun).  The  relative  size  of  the  two  sectors  is 
varied  until  the  assessor  is  indifferent  between  the  two  bets.  The 
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proportion  of  the  disk  which  is  orange  is  taken  as  the  probability  of 
the  event  stated  in  the  other  bet. 

Despite  the  appeal  of  this  method  (it  is  formally  justified  within 
axiomatic  models  of  subjective  probability,  does  not  require  the  as- 
sumption thaV'the  utility  of  money  is  linear  with  money,  and  requires 
no  numerical  response  from  the  assessor),  we  have  been  unable  to  find 
any  vesearch  on  its  use. 

/ DISCRETE  EVENTS  Comparisons  among  several  direct  methods  for  assessing 

the  probabilities  of  discrete  events  (probabilities  vs.  odds  vs.  log 
^ / 

odds)  have  failed  to  identify  one  clearly  preferable  response  mode 
(35,  73a,  105).  Beach  (22)  found  a mean  within-sub j ect  correlation 
of  only  .49  between  probabilities  assessed  directly  and  indirectly 
(via  bids  for  bets).  DuCharme  & Donnell  (76)  found  equally  conservative 
inferences  using  odds,  probabilities,  and  an  indirect  method  similar 
in  concept  to,  but  more  complicated  than,  the  reference  bet  method 
discussed  by  Spetzler  & Stael  von  Holstein  (260). 

These  studies  focused  on  the  assessment  of  middle-range  proba- 
bilities; even  less  is  known  about  assessing  very  large  or  very  small 
probabilities.  Slovic,  Fischhoff  & Lichtenstein  (251)  have  shown  that 
subjects  grossly  misuse  odds  of  greater  than  50:1.  Selvidge  (241)  has 
made  some  common-sense  suggestions  for  assessing  very  small  probabilities. 
She  advised  first  structuring  and  decomposing  the  problem,  then  ranking 
various  unlikely  events,  and  finally  attaching  numbers  to  those  events 
with  the  help  of  reference  events  (like  dying  in  various  rare  accidents). 

Once  you've  assessed  a probability,  how  good  is  it?  When  there 
is  an  agreed-upon  "true  probability" — as  with  bookbag  and  poker  chip 
tasks — the  assessed  probability  may  be  compared  with  the  "truth."  But 
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more  often,  the  assessed  probability  states  a degree  of  belief  in  some 
proposition,  so  that  no  criterion  "true"  probability  value  exists. 

One  test  of  such  probabilities  is  coherence,  that  is,  do  they  abide 
by  the  axioms  of  probability?  (290,  316).  A second  kind  of  validity, 
called  calibration,  may  be  examined  if  one  collects  a large  number 
of  assessments  for  which  the  truth  of  the  associated  propositions  is 
known.  For  discrete  propositions,  calibration  means  that  for  every 
collection  of  propositions  assigned  the  same  numerical  probability, 
the  hit  rate  or  proportion  which  actually  are  true  should  be  equal 
to  the  assessed  probability.  The  research  on  calibration  has  recently 
been  extensively  reviewed  (175),  so  only  a summary  of  findings  will 
be  given  here:  (a)  Experienced  weather  forecasters,  when  performing 

their  customary  tasks,  are  excellently  calibrated.  (b)  Everybody  else 
stinks.  (c)  People  are  overconfident  except  with  very  easy  tasks. 

UNCERTAIN  QUANTITIES  The  most  common  technique  for  assessing  proba- 
bility density  functions  across  uncertain  quantities  is  the  fractile 
method.  An  assessor  who  names  a value  of  an  uncertain  quantity  as  its 
.25  fractile,  for  example,  is  saying  that  there  is  just  a 25%  chance 
that  the  true  value  will  be  smaller  than  that  specified  value.  Stael 
von  Holstein  (264)  and  Vlek  (290)  have  studied  the  consistency  between 
the  fractile  method  and  other  elicitation  methods.  Stael  von  Holstein 
found  that  even  after  four  sessions,  most  subjects  were  inconsistent. 

Vlek's  subjects  showed  greater  consistency. 

Continuous  probability  density  functions  can  also  be  tested  for 
calibration.  Assessors  are  calibrated  when,  over  many  such  assessments, 
the  proportion  of  true  answers  falling  below  a given  fractile  is  equal 
to  that  fractile.  The  evidence  on  calibration  (175)  may  be  summarized 
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as  follows:  (a)  A strong  and  nearly  universal  bias  exists:  the  as- 

sessed distributions  are  too  tight,  so  that  from  20%  to  50%  of  the 
true  values,  instead  of  2%,  fall  outside  of  the  .01  to  .99  range  of 
the  distributions;  (b)  training  improves  performance. 


SCORING  RULES  Scoring  rules  are  functions  which  assign  a score  to 
an  assessed  probability  (or  a vector  of  probabilities)  as  a function 
of  both  the  true  outcome  of  the  event  being  assessed  and  the  size  of 
the  probability  associated  with  the  true  outcome.  Such  rules  are 
strictly  proper  if  and  only  if  the  only  strategy  for  maximizing  one's 
expected  score  is  to  tell  the  truth — to  state  one's  true  belief  with- 
out hedging.  Usually  the  only  rules  considered  are  those  which  reward 
expertise:  given  that  one  tells  the  truth,  the  more  one  knows,  the 

larger  the  score  (an  exception  is  Vlek's  [291]  fair  betting  game). 
Scoring  rules  have  recently  been  discussed  by  Murphy  & Winkler  (205, 
206)  and  by  Shuford  & Brown  (50,  246). 

Scoring  rules  may  be  used  for  three  purposes.  The  first  use  is 
as  an  indirect  method  for  measuring  probabilities.  A list  of  bets  is 
generated  from  the  scoring  rule.  Each  bet  gives  two  numbers,  how  much 
the  assessor  wins  if  the  event  in  question  occurs  and  how  much  is  lost 
if  it  does  not.  The  assessor  selects  his  or  her  preferred  bet  from 
the  list;  this  choice  implies  a probability.  Jensen  & Peterson  (136) 
and  Seghers , Fryback  & Goodman  (240)  found  this  method  unsatisfactory; 
their  subjects  were  apparently  using  other  strategies  rather  than  try- 
ing to  maximize  winnings . 

The  second  use  of  scoring  rules  is  to  educate  assessors  about 
probability  assessments  made  with  other  methods.  Several  studies  have 
used  scoring  rule  feedback  (246,  263,  308)  without  reporting  whether 
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it  helped.  Hoffman  & Peterson  (120)  reported  that  subjects  who  re- 
ceived such  feedback  improved  their  scores  on  a subsequent  task,  but 
Vlek  (290)  found  no  such  improvement.  Scoring  rules  are  now  widely 
used  by  weather  forecasters,  and  this  may  be  why  they  are  so  well 
calibrated  (175).  Murphy  & Winkler  (207)  reported  that  a majority 
of  689  weather  forecasters  (a)  described  themselves  as  being  uncom- 
fortable thinking  in  probabilistic  terms  (though  their  job  is  to  re- 
port probabilities,  and  they  do  it  well)  and  (b)  rejected  the  idea 
that  their  forecasts  can  be  properly  evaluated  by  a single  quantitative 
measure  like  a scoring  rule  (though  many  had  had  experience  with  such 
feedback) . 

The  third  use  for  scoring  rules  is  to  evaluate  assessors.  When 
all  assessors  are  working  in  the  same  situation,  the  assessor  with 
the  highest  score  is  the  best  assessor.  However,  not  all  situations 
are  equal;  there  is  more  uncertainty  in  forecasting  rain  in  Chicago 
than  in  Oregon.  Thus,  Oregon  forecasters  will  earn  higher  scores 
simply  because  of  where  they  work.  Murphy  (203)  has  shown  that  the 
Brier  scoring  rule  (the  one  used  in  meteorology)  may  be  partitioned 
into  three  additive  components,  measuring  (a)  the  inherent  uncertainty 
in  the  task,  (b)  the  resolution  of  the  assessor  (the  degree  to 
which  the  assessor  can  successfully  assign  probabilities  different 
from  the  overall  hit  rate);  and  (c)  the  assessor's  calibration.  None 
of  the  components  is  itself  a proper  scoring  rule,  but  the  difference 
between  the  total  score  and  the  inherent  uncertainty  component  is 
proper,  and  this  difference  could  be  used  to  compare  assessors  in  dif- 
ferent situations  (204). 
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The  astute  reader  will  note  that  the  research  does  not  provide 
an  adequate  answer  to  the  question  asked  at  the  start  of  this  section: 
What  is  the  best  way  to  assess  probabilities?  In  addition,  the  re- 
search has  yielded  few  theoretical  ideas.  Only  Pitz  (219)  has  spec- 
ulated on  the  cognitive  processes  underlying  probability  assessment. 
Finally,  although  a few  studies  have  noted  that  training  improves 
performance  in  eliciting  probabilities,  a definitive  long-range  learning 
study  is  still  needed. 

Multi-attribute  Utility  Theory 

Suppose  you  must  choose  one  object  or  course  of  action  from  a 
set.  Each  object  or  action  is  describable  in  terms  of  a number  of 
dimensions  or  attributes  of  value  to  you,  and  the  outcomes  of  your 
choice  are  certain.  Then  multi-attribute  utility  theory  (MAUT)  pre- 
scribes that  you  compute,  for  each  object  j,  the  following  weighted 
utilities,  summed  across  the  attributes  i: 

MAU.  = T.  w.u  , 

J i 1 1J 

where  w^  is  the  relative  importance  of  the  i'th  attribute  and  u_  is 
the  utility  of  the  j ' th  object  on  the  i'th  attribute.  For  example, 
when  choosing  a car,  w^  might  be  the  importance  of  design,  and  u_ 
would  indicate  how  beautifully  designed  car  j is.  The  theory  pre- 
scribes that  you  choose  the  car  with  the  largest  MAU.  While  this 
model  is  the  most  common,  variants  exist  which  incorporate  additional 
features  such  as  uncertainty,  multiplicativity  (rather  than  additivity) 
of  the  weighted  utilities,  time  factors,  and  the  possibility  that  your 
choice  will  affect  others  (293). 

MAUT  is  a decision  aid  strongly  grounded  in  theory.  The  axioms 
of  the  theory  lead  to  the  models,  to  methods  for  measuring  the  utili- 


ties  and  weights,  and  to  specified  tests  which  show  which  of  the  models 
is  applicable.  MAUT  models  have  been  extensively  developed  in  the 
last  five  years  (94,  95,  96,  141,  143,  233,  234).  If  these  sources 
are  too  technical,  the  review  papers  by  MacCrimmon  (186),  Fischer 
(86,  88),  von  Winterfeldt  & Fischer  (296),  Humphreys  (131),  and  Huber 
(129a)  may  be  helpful. 

ASSESSMENT  TECHNIQUES  The  first  step  in  constructing  a MAU  is  to  list 
the  alternatives.  Techniques  for  doing  this  are  rarely  discussed. 

Among  those  who  have  faced  the  problem,  some  have  used  the  Delphi 
technique  (e.g.,  102,  211).  Humphreys  & Humphreys  (132)  suggested 
using  George  Kelly's  repertory  grid  technique.  Dalkey,  Lewis  & Snyder 
(65)  proposed  evaluating  diverse  problems  (e.g.,  job  choice,  modes, 
of  transportation)  not  on  the  basis  of  their  apparent  attributes  but 
on  a common  set  of  attributes  reflecting  quality  of  life  (e.g.,  se- 
curity, fun,  freedom).  Beach  el:  a_l.  (23)  described  an  extensive  in- 
terviewing technique,  involving  several  interactions  with  different 
decision  makers,  to  arrive  at  a list  of  attributes. 

It  seems  obvious  that  the  omission  of  an  important  attribute  can 
seriously  alter  the  results  of  a MAUT  application.  However,  Aschen- 
brenner  & Kasubek  (12)  found  reasonably  similar  results  for  preference 
among  apartments  from  MAU  analyses  based  on  two  different,  only  par- 
tially overlapping  sets  of  attributes. 

Weights  and  utilities  can  be  assessed  either  directly  or  indi- 
rectly. Direct  approaches,  which  are  simple  but  not  theoretically 
justified,  include  ranking  or  rating  scales,  or  just  asking  the  assessor 
for  the  relevant  numbers.  For  utilities,  the  assessor  may  be  presented 
with  graph  paper  and  asked  to  sketch  a curve.  Utility  functions  may 


39 


also  be  derived  by  constructing  indifference  curves  for  pairs  of 
variables  (189,  190);  these  methods  are  lengthy,  tedious,  and  clearly 
impractical  when  there  are  many  variables.  After  two  indifference 
curves  for  the  same  pair  of  variables  are  assessed,  a "staircase" 
method  can  be  used  by  the  analyst  to  uncover  the  utility  curves  for 
each  of  the  variables,  assuming  that  the  variables  are  value  inde- 
pendent (see  156,  p.  57-61). 

Indirect  methods  are  justified  within  the  theory,  but  are  exceed- 
ingly complex.  They  rely  on  a comparison  between  a gamble  and  a sure 
thing,  and  thus  introduce  probabilities  into  an  otherwise  riskless  sit- 
uation. For  example,  to  assess  the  weight  of  one  attribute  from  a 
set  of  14  attributes  describing  apartments  (such  as  number  of  bedrooms, 
general  cleanliness,  etc.),  the  analyst  says,  "Apartment  A has  the  best 
(most  preferred)  level  of  all  14  attributes.  Apartment  B has  the  worst 
level  of  all  14  attributes.  Apartment  C has  the  best  level  on  one 
attribute  and  the  worst  level  on  each  of  the  other  13.  State  a proba- 
bility p such  that  you  are  indifferent  between  receiving  C for  sure 
versus  receiving  a gamble  wherein  you  will  obtain  A with  probability 
p and  B with  probability  (1-p) • What  is  the  value  of  p that  makes  you 
indifferent?"  The  value  of  p that  you  name  is  the  weight;  such  a 
question  must  be  asked  for  each  attribute. 

The  two  indirect  methods  for  assessing  utilities  are  similar  to 
the  indirect  method  for  assessing  weights,  except  that  "Apartment  C" 
now  has  an  intermediate  level  for  one  alternative,  and  the  worst  level 
for  all  others.  In  the  variable-probability  method,  as  with  assessing 
weights,  the  task  is  to  name  a probability  that  makes  the  sure  thing 
(Apartment  C)  indifferent  to  the  gamble.  In  the  fixed-probability 
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method,  the  probabilities  associated  with  the  gamble  are  held  constant 
at  (1/2,  1/2),  and  the  assessor  must  name  that  intermediate  value  on 
one  attribute  of  the  sure  thing  which  leads  to  indifference.  In  either 
case,  one  answer  gives  only  one  point  on  the  utility  curve,  so  that 
several  responses  are  required  to  estimate  its  shape,  for  each  attri- 
bute . 

Kneppreth  et^  al^.  (156)  have  written  an  excellent  review  of  the 
methods  for  assessing  utilities,  explaining  each  method  in  detail, 
noting  advantages  and  disadvantages,  and  referencing  relevant  research. 
That  research  has  been  unsystematic  and  allows  no  clear  conclusions. 
Perhaps  future  researchers  should  model  their  work  on  a study  by 
Vertinsky  & Wong  (289).  Comparing  an  indifference  curve  method  with 
the  indirect  fixed-probability  method,  they  looked  at  test-retest  re- 
liability and  a host  of  other  indices,  including  the  acceptance  of 
particular  rationality  axioms,  realism  of  the  tast,  confidence  in  the 
method,  bias  in  the  interpretation  of  probability,  and  a measure  of 
the  width  of  an  indifference  band  across  the  variables.  They  found 
that  the  indirect  method  was  more  reliable  and  easier  for  the  subjects, 

* while  the  indifference  curve  technique  predicted  more  subsequent  choices. 

j 
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, ISSUES  In  MAUT,  two  issues  are  paramount.  The  first  is:  Is  it  valid? 

* 

Early  research  in  the  use  of  MAUT  frequently  involved  correlating  the 
results  of  the  model  with  unaided  wholistic  judgments  of  the  same  situa- 
tions made  by  the  same  subjects  (e.g.,  130,  132,  294,  and  earlier  pa- 
pers referenced  in  the  reviews  mentioned  above).  A high  correlation  be- 
tween the  model  and  the  wholistic  judgments,  the  usual  result,  was  taken 
as  evidence  that  the  model  was  valid.  This  conclusion  seems  faulty  to 
us.  If  unaided  wholistic  preferences  are  good  enough  to  constitute 
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criteria  for  a decision  aid  like  MAUT,  who  needs  the  decision  aid? 
Furthermore,  a decade  or  more  of  research  has  abundantly  documented 
that  humans  are  quite  bad  at  making  complex  unaided  decisions  (248); 
it  could  thus  be  argued  that  high  correlatiors  with  such  flawed  judg- 
ments would  suggest  a lack  of  validity.  More  sophisticated  approaches 
have  been  taken  by  Fischer  (87),  who  showed  greater  agreement  among 
three  different  decomposition  procedures  than  among  three  different 
wholistic  procedures,  and  by  Newman  (208),  who  proposed  applying 
Cronbach's  (64)  theory  of  generalizability  to  the  problem  of  validating 
MAUT  techniques. 

But  most  practitioners  and  theorists  approach  the  validity  ques- 
tion as  follows:  the  theory  specifies  the  models,  the  assessment  pro- 

cedures, and  the  tests  for  choosing  which  model  applies.  Thus,  if  you 
accept  the  axioms  (yes,  I do  want  my  choices  to  be  transitive;  I should 
not  be  swayed  by  irrelevant  alternatives,  etc.)  and  pass  the  tests, 
then  you  can  be  assured  that  you  are  doing  the  right  thing.  There  is 
no  remaining  validity  question. 

The  second  issue  concerns  error.  Indirect  elicitation  techniques 
for  both  weights  and  utilities  are,  as  previously  noted,  quite  complex, 
but  theoretically  justifiable.  The  direct  methods,  in  contrast,  seem 
easier,  but  are  theoretically  unjustified.  If  one  assumes  that  the 
decision  maker  has  underlying  weights,  utilities,  and  preferences, 
which  approach,  direct  or  indirect,  elicits  these  underlying  values 
with  least  error?  Von  Winterfeldt  (293)  discussed  but  did  not  resolve 
this  issue.  Practitioners  can  (and  often  do)  perform  sensitivity  anal- 
yses (how  much  can  I change  this  parameter  before  the  decision  changes?). 
Such  sensitivity  analyses  will  identify  potential  problems  of  measure- 
ment, but  not  solve  them. 
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The  tests  which  are  used  to  determine  which  MAUT  model  is  applic- 
able are  equally  complex.  The  test  for  additivity  uses  the  weights 
derived  from  the  indirect  method.  If  the  weights  across  all  the  at- 
tributes sum  to  1.0,  an  additive  model  may  be  used.  Otherwise,  a 
multiplicative  model  is  used.  No  error  theory  is  available  to  tell 
you  whether  a sum  of,  say,  1.4  is  "close  enough"  to  1.0  to  justify 
an  additive  model.  An  alternative,  and  seemingly  easier,  test  is  avail- 
able for  additivity  (see  296,  p.  70).  Unfortunately,  no  alternatives 
are  available  for  two  other  necessary  tests.  These  tests  are  for  two 
kinds  of  utility  independence  (called  "preferential  independence"  and 
"utility  independence"  by  Keeney  [142],  and  "WCL'I"  and  "SCUI"  by  others 
[see  296]).  The  following  question,  with  reference  to  the  location 
of  the  Mexico  City  airport  (142),  is  just  the  starting  point  for  these 
tests:  "How  many  people  seriously  injured  or  killed  per  year,  call 

that  number  x,  makes  you  indifferent  between  the  option:  [x  injured 

or  killed  and  2500  persons  subjected  to  high  noise  levels]  and  the 
option:  [one  person  injured  or  killed  and  1,500,000  subjected  to  high 

noise  levels]?"  Several  such  questions  must  be  asked  for  each  attribute 
and  for  all  pairs  of  attributes.  The  frequent  avoidance  of  these  tests 
may  not  reflect  laziness,  but  a genuine  suspicion  that  using  an  unjus- 
tified model  may  lead  to  fewer  errors  than  choosing  a model  on  the 
basis  of  confused  responses  to  complex  questions  such  as  these.  As 
von  Winterfeldt  (293)  has  noted,  "even  after  you  go  through  the  process 
of  model  elimination  and  selection,  you  will  still  have  to  make  up  your 
mind  about  the  possible  tradeoffs  between  assessment  error  and  modeling 
error"  (p.  65). 

The  flavor  of  the  indirect  assessment  methods  and  the  three  tests 
mentioned  above  may  be  appreciated  by  reading  54  pages  of  dialogue 
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between  an  analyst  (Keeney)  and  an  expert  as  they  evaluate  alternatives 
for  the  production  of  electrical  energy  (144). 

RECENT  RESEARCH  The  "new  look"  in  MAUT  research  is  to  explore  its 
uses.  Can  it  be  done?  What  problems  are  encountered?  What  can  be 
learned  from  applying  MAUT?  Gardiner  & Edwards  (102)  showed  that  in 
a highly  controversial  issue  (coastal  land  development)  two  groups  of 
experts  (developers  and  conservationists)  showed  notably  less  disagree- 
ment about  the  evaluation  of  proposed  apartment  buildings  in  their 
MAUT  evaluations  than  in  their  wholistic  evaluations.  O'Connor  (211) 
reported  the  difficulties  in  getting  many  experts  to  agree  on  eval- 
uations of  water  quality  while  trying  to  (a)  minimize  the  amount  of 
experts'  time  needed  for  the  evaluation,  (b)  eliminate  redundant  or 
strongly  interrelated  attributes,  and  (c)  cope  with  possible  non- 
compensatory factors  (if  the  water  is  loaded  with  arsenic,  nothing 
else  matters).  Guttentag  & Sayeki  (110)  used  a MAUT  technique  to 
illuminate  the  cultural  differences  in  values  and  believes  about  peace 
issues  between  Japanese  and  Americans.  In  one  of  two  reports  of  real 
applications  (i.e.,  working  with  clients  who  paid  for  the  advice), 

Keeney  observed  the  changes  in  a MAUT  system  after  two  years  of  use 
(145).  In  the  second  report,  he  described  the  complexities  of  deciding 
where  and  when  to  build  a new  airport  in  Mexico  City  (142).  Additional 
proposals  for  applications  of  MAUT,  without  relevant  data,  have  been 
made  for  the  development  of  social  indicators  (258),  military  system 
effectiveness  (287)  and  solid  waste  management  (150).  Finally,  com- 
puter programs  to  aid  elicitation  of  MAUT  have  been  written  (146). 

Decision  Analysis 

The  most  general  approach  for  systematically  evaluating  alternative 
actions  is  decision  analysis,  an  approach  developed  largely  at  the 
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Harvard  Business  School  (221,  235)  and  two  private  contract  research 
firms,  the  Stanford  Research  Institute  (125)  and  Decisions  and  Designs, 
Inc.  (49).  In  facing  a new  problem,  the  analyst  lists  the  decision 
alternatives,  constructs  a model  of  their  interrelations,  assesses  the 
probabilities  of  relevant  contingencies,  finds  out  what  the  decision 
maker  wants  and,  finally,  assays  the  expected  value  or  utility  of  each 
alternative.  To  do  this,  decision  analysts  use  a bag  of  tricks  drawn 
from  crafts  such  as  operations  research,  Bayesian  statistics,  SEU  and 
MAUT,  which  allow  the  analyst  to,  "in  principle,  address  any  decision 
problem  with  unimpeachable  rigor"  (49,  p.  64).  A common  tool  is  the 
decision  tree,  which  diagrams  the  uncertain  consequences  arising  from 
a decision. 

Among  the  problems  that  have  been  given  full-dress  decision  anal- 
yses are  whether  to  seed  hurricanes  in  hopes  of  reducing  their  inten- 
sity (126) , how  to  establish  planetary  quarantine  requirements  for 
trips  to  Mars  and  Jupiter  (127),  what  value  nuclear  power  generating 
plants  have  for  Mexico  (261),  and  how  to  design  export  controls  on 
computer  sales  to  the  Soviet  Bloc  (71).  Many  environmental  impact 
statements,  cost-benefit  analyses  and  risk  assessments  constitute  var- 
iants on  decision  analytic-methodology  (55,  91,  198,  216). 

Although  many  of  these  analyses  are  already  highly  sophisticated, 
the  basic  methodology  is  still  developing — often  in  response  to  spe- 
cific problems.  Work  in  the  last  five  years  has  increased  our  ability 
to  evaluate  decision  trees  efficiently  (288),  assess  the  value  of 
decision  flexibility  (194),  and  understand  how  nu  dels  approximate 
the  processes  they  are  intended  to  describe  (276). 

Some  awareness  of  psychological  issues  can  be  found  in  decision 
analysis.  One  example  attempts  to  use  the  best  psychological  scaling 
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techniques  for  eliciting  probability  judgments  (260).  Another  empha- 
sizes communicating  effectively  with  decision  makers;  the  analyst  is 
encouraged  to  develop  a role  "not  too  dissimilar  to  that  of  a psycho- 
analyst" (49,  p.  9).  Brown  (48)  raised  a cognitive  problem  that  war- 
rants further  examination.  He  noted  that  decision  analyses  often  fail 
to  model  responses  to  future  events.  As  a result,  when  those  future 
events  actually  occur,  they  are  responded  to  in  totally  unanticipated 
ways,  because  in  the  flesh  they  look  different  than  they  did  at  the 
time  of  the  analysis. 

Man/Machine  Systems 

For  years,  one  of  the  most  promising  areas  in  decision  aiding 
has  been  the  development  of  computerized  aids  for  helping  decision 
makers  cope  with  complex  problems.  Systems  designed  to  elicit  MAUT 
appraisals  fali  into  this  category,  as  do  the  approaches  described 
below. 

REGRESSION  APPROACHES  Research  within  the  regression  paradigm  has 
shown  that  people  have  difficulty  both  applying  the  judgmental  poli- 
cies they  wish  to  implement  and  describing  the  policies  they  actually 
are  implementing.  Hammond  and  colleagues  have  developed  computer- 
graphics  systems  to  combat  both  of  these  problems  (113a,  117).  Since 
these  techniques  can  describe  the  policies  of  several  participants 
in  a given  situation,  they  have  been  used  to  resolve  interpersonal  and 
intergroup  conflicts  (39)  and  to  facilitate  policy  formation  at  the 
societal  level  (2,  116). 

Another  major  decision-aiding  technique  is  bootstrapping,  which 
replaces  judges  with  algebraic  models  of  their  own  weighting  policies. 
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Recent  research  has  continued  to  demonstrate  that  these  models  perform 
as  well  as  or  better  than  the  judges  themselves  (14,  68,  119,  202, 

237,  307).  Additional  work  promises  to  further  enhance  the  useful- 
ness of  bootstrapping.  Einhorn  (81,  82)  showed  how  expert  judgment 
and  statistical  techniques  can  incorporate  poorly  defined  and  hard- 
to-measure  variables  into  judges'  models.  Dawes  & Corrigan  (70)  de- 
monstrated that  in  most  situations  the  criterion  being  judged  could 
be  predicted  well  by  models  with  unit  weights  (see  also  83,  297). 

These  unit-weighting  results  suggest  that  in  many  decision  settings, 
all  the  judge  needs  to  know  is  what  variables  to  throw  into  the  equa- 
tion, which  direction (+  or  -)  to  weight  them,  and  how  to  add.  Actually, 
Benjamin  Franklin  had  this  insight  about  unit  weighted  linear  models 
back  in  1772  (187,  p.  27). 

PIP  One  of  the  earliest  proposals  for  sharing  the  decision-making 
load  between  the  machine  and  the  decision  maker  was  (79)  the  Probabil- 
istic Information  Processing  System  (PIP).  In  situations  where  judges 
must  revise  their  probabilities  upon  receipt  of  new  information,  the 
PIP  system  accepts  the  judges'  subjective  assessments  of  prior  proba- 
bilities, and  of  the  probability  of  each  datum  conditional  on  each 
hypothesis,  and  then  aggregates  them  according  to  Baves'  theorem  in 
order  to  produce  posterior  probabilities  of  the  hypotheses.  A review 
in  1971  (254)  revealed  an  abundance  of  research  on  PIP;  since  then, 
however,  the  flood  has  receded.  A few  recent  studies  have  discussed 
what  to  do  when  the  data  are  not  conditionally  independent  of  one 
another  and  have  examined  how  well  subjects  handle  such  data  (74,  129, 
266).  A couple  of  interesting  medical  applications  have  been  proposed 
(108,  109). 
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DYNAMIC  SYSTEMS  Some  of  the  most  ambitious  interactive  man/machine 
systems  have  been  developed  to  handle  dynamic  decision-making  situ- 
ations. The  problems  studied  by  researchers  in  this  area  are  extremely 
varied  and  the  systems  developed  to  solve  them  tend  to  be  highly  spe- 
cific. However,  a pattern  of  conceptualizing  the  task,  developing 
the  mathematics  and  software  to  handle  it,  and  then  validating  the 
system  in  one  or  a series  of  experiments  is  common.  As  an  example, 
a team  at  Perceptronics,  Inc.  has  developed  a highly  sophisticated 
system  to  a ;ist  naval  officers  tracking  "the  elements  of  a simulated 
fishing  fleet  [one  trawler  and  one  iceberg]  as  it  moves  about  in  an 
expanse  of  ocean,"  a task  that  vaguely  resembles  a futuristic  version 
of  Battleships  (67,  p.  301).  The  system  tracks  the  decision  maker's 
responses  continuously  and  uses  utilities  inferred  from  them  to  rec- 
ommend maximum  expected  utility  decisions  (98) . From  an  experiment 
testing  the  system  with  12  Naval  Reserve  NCO's  during  four  90-minute 
sessions,  Davis  et  al . (67)  concluded  that  it  worked  in  realistic 
decision-making  situations,  was  accepted  by  experienced  operators, 
and  markedly  improved  performance. 

Such  systems  may  be  designed  either  as  products  that  will  actually 
work  in  some  field  situation  or  as  research  tools.  Perhaps  because 
of  their  expense,  most  products  have  been  designed  to  solve  specific 
military  problems  with  no  civilian  analog  (although  readers  concerned 
about  the  possible  presence  of  Soviet  frogpersons  in  their  bathtub  or 
swimming  pool  might  want  to  consult  Irving  [133]).  It  is  difficult 
for  the  non-expert  to  judge  the  validity  of  these  systems  and  the 
acceptability  of  their  advice. 

With  systems  designed  for  research  purposes,  a critical  issue  is 
the  tradeoff  between  realism  and  generality.  One  strategy  is  to  design 


systems  whose  complexity  begins  to  approach  that  found  in  the  real 
world — at  the  risk  of  investing  too  much  of  available  resources  in 
the  machine  and  too  little  in  understanding  how  people  use  it.  Some 
human  factors  questions  worth  studying  are  (a)  how  do  variations  in 
the  basic  system  (e.g.,  different  instructions  or  information  displays) 
affect  peoples'  performance?  (b)  how  do  person  and  machine  errors 
interact?  (c)  how  should  machine  output  be  adjusted  to  different  de- 
cision makers'  cognitive  styles  and  work  paces  (170,  171)?  and  (d) 
when  do  people  heed  the  machine's  advice  (111,  112)? 

Another  problem  with  these  systems  is  that  their  very  complexity 
makes  it  difficult  to  compare  results  from  one  research  context  to 
the  next.  Perhaps  the  only  way  to  do  that  is  to  interpret  the  results 
in  terms  of  basic  psychological  (judgmental)  phenomena.  If  that  tack 
is  taken,  then  one  might  ask  whether  the  development  of  general  behav- 
ioral principles  would  not  be  served  best  by  using  a number  of  simpler, 
cheaper  and  more  flexible  systems,  such  as  the  tactical  and  negotiations 
game  used  by  the  Streuferts  and  colleagues  (e.g.,  269).  Research  show- 
ing why  man/machine  systems  should  be  adopted  might  provide  a more  con- 
vincing case  than  the  demonstration  in  a complex  simulation  that  de- 
cision makers  do  better  with  the  machine's  help.  The  skeptic  may  argue 
that  such  demonstrations  merely  show  that  one  can  design  a simulated 
task  in  which  it  helps  to  have  machine  assistance. 

Using  Decision  Aids 

Do  decision  makers  use  these  sophisticated  techniques?  Boot- 
strapping is  now  being  applied  for  a variety  of  repeated  decisions. 

On  the  other  hand,  apparently  few,  if  any,  PIP  systems  are  operational 
today,  despite  the  mass  of  research  refining  its  methodology.  For 
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most  aids,  a clear  picture  is  hard  to  come  by.  In  the  scientific  lit- 
erature one  can  find  demonstration  projects  showing  a procedure's  via- 
bility. However,  when  a technique  passes  the  test  of  getting  someone 
to  pay  for  it,  the  result  typically  becomes  proprietary.  For  reasons 
of  national  or  industrial  security,  the  details  of  such  projects  are 
not  divulged,  nor  are  the  decision  makers'  responses  to  them.  Most 
overviews  by  those  in  the  decision  aiding  business  understandably  tend 
to  be  quite  optimistic. 

Brown  (47,  49),  however,  has  presented  an  insightful  discussion 
of  factors  that  may  limit  decision  makers'  receptiveness  to  decision 
analysis  and  presumably  to  other  techniques  as  well.  One  is  the  fact 
that  decision  makers  often  employ  an  analyst  to  reduce  the  uncertainty 
in  a problem  situation,  not  to  acknowledge  and  quantify  it.  Another 
source  of  resistance  is  the  absence  of  top-level  decision  makers  famil- 
iar with  the  technique;  a third  is  the  bad  experiences  of  decision 
makers  who  try  to  solo  on  the  technique  without  proper  training.  Brown, 
Kahr  & Peterson  (49)  suggested  that  decision  analysis  is  a clinical 
skill  that  should  only  be  practiced  after  internship  with  an  expert. 

Another  problem  is  that  decision  makers  may,  even  after  careful 
coaching,  reject  the  basic  conception  (e.g.,  the  axioms)  on  which  the 
aids  are  based.  Protocols  of  conversations  between  analysts  and  de- 
cision makers  leave  the  impression  that  decision  makers  are  under 
considerable  pressure  to  adopt  the  analyst's  perspective.  It  is  de- 
batable whether  satisfaction  with  the  results  of  such  an  analysis  show 
that  the  analyst  has  really  answered  the  decision  maker's  needs. 

Conrath  (58)  and  Reeser  (227)  found  that  decision  makers  reject  de- 
cision analysis  (and  related  techniques)  for  being  both  overly  compli- 
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cated  and  divorced  from  reality.  Individuals  who  may  accept  the 
assumptions  of  such  analysis  may  still  reject  their  logical  impli- 
cations if  they  are  unintuitive  or  too  difficult  to  explain  and 
justify  to  others. 

A problem  discussed  earlier  is  whether  decision  makers  can  provide 
the  required  probability,  utility  and  modeling  judgments.  Because  of 
the  vagaries  of  such  judgments,  the  decision  aider  runs  the  risk  of 
grinding  through  highly  sophisticated  analyses  on  inputs  of  very  little 
value.  Certainly  "garbage  in — garbage  out"  applies  to  decision  aiding — 
with  the  particular  danger  than  undue  respect  may  be  given  to  garbage 
produced  by  high-powered  and  expensive  grinding.  Relatively  little 
is  known  about  the  sensitivity  of  decision  aids  to  errors  in  elicitation 
and  problem  structuring.  Von  Winterfeldt  & Edwards  (294a)  have  proven 
that  under  very  general  conditions  probability  and  utility  estimates 
can  be  somewhat  inaccurate  without  leading  to  appreciably  suboptimal 
decisions.  Their  proof  is  applicable  to  the  case  where  decision  options 
are  continuous  (e.g.,  invest  X dollars).  However,  Lichtenstein,  Fisch- 
hoff  & Phillips  (175)  have  shown  how  a moderate  error  in  probability 
estimation  can  lead  to  a substantial  decrease  in  expected  utility  when 
the  decision  options  are  discrete  (e.g.,  operate  vs.  don't  operate). 

Von  Winterfeldt  & Edwards  (295)  have  identified  a large  class  of  errors 
which  can  lead  to  large  expected  losses  and  are  extremely  difficult 
to  detect.  They  arise  from  the  selection  of  dominated  decision  alter- 
natives as  the  result  of  inappropriately  modeling  the  decision  problem. 

How  much  is  a decision  aid  worth?  This  difficult  question  is 
typically  answered  with  arguments  why  aids  should,  in  principle,  be  worth 
the  resources  invested  in  them.  Recently,  Watson  & Brown  (303)  pro- 


vided  enlightenment  with  a formal  model  for  performing  a decision 
analysis  of  a decision  analysis.  The  model  is  accompanied  by  three 
case  studies  (304)  that  highlight  the  difficulties  of  performing  a 
hindsightful  analysis.  Ironically,  the  greatest  value  of  two  of  these 
analyses  came  from  their  contribution  to  organizational  processes  (re- 
duction of  controversy  and  improvement  of  communication),  considerations 
that  were  left  out  of  the  formal  model  for  the  sake  of  simplicity. 

CONCLUSION 

One  reason  for  the  vitality  of  the  research  described  here  is  the 
increased  importance  of  deliberative  decision  making  in  our  daily  lives. 
In  a non-traditional  society  individuals  must  rely  on  their  analytical 
resources  rather  than  habit  in  guiding  their  affairs.  A rapidly 
changing  and  interrelated  world  cannot  allow  itself  the  luxury  of 
trial  and  error  as  it  attempts  to  cope  with  problems  like  nuclear 
power  and  natural  hazard  management.  Economists,  engineers,  operations 
researchers,  decision  analysts  and  others  are  developing  sophisticated 
procedures  for  these  problems.  It  is  our  job  as  psychologists  to 
remind  them  of  the  human  component  in  implementing  these  techniques 
and  explaining  their  conclusions  to  the  public — in  particular  to  point 
out  the  errors  that  may  arise  from  judgmental  biases.  We  must  help 
the  public  to  make  its  private  decisions  and  to  develop  a critical 
perspective  on  those  decisions  made  in  its  behalf. 
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